Cloud Installation

Introduction

This section provides general information on deploying ModelOps to a Kubernetes cluster.

Requirements

The following tools are required to complete the installation - these must be downloaded and installed prior to installing ModelOps :

Additional requirements depend on the Kubernetes cloud platform being used:

Optional Tools

These tools are optional, but have been found to be useful

  • Lens
    • macOS: brew install lens

Docker for desktop also requires:

  • Kubernetes must be enabled Kubernetes Preference
  • Docker Engine must enable insecure registries for the local machine Enable Insecure Registries Preference
  • Allocate required resources, roughly Required Resources

Kind

Kind has no additional requirements.

Azure AKS

Azure AKS also requires:

  • Azure CLI Tools must be installed and configured.
    • macOS: brew install azure-cli

OpenShift

OpenShift also requires:

Sizing

Modelops can be quickly configured for a small, medium or large installation whilst also allowing for further customizations as needed.

Whilst many values are best kept at defaults, the values listed below have biggest effect on sizing and so are exposed as install options.

Small - for single scoring

Minimum hardware is single virtual machine, 6 cores, 30GB memory and 100G disk space. Large configurations will allow for more concurrent scoring and faster executions. Can be run in desktop kubernetes environments such as docker for desktop.

Linux based scoring only ( a hybrid deployment of linux and windows nodes is required for statistica scoring ).

The helm install option, –set global.size=small, defaults to :

Feature Value Units
Scoring flow memory limit 1.5Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Scoring flow cpu limit 2 See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus memory limit 1.5Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus disk space 20Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus memory limit 1.5Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Git server disk space 5Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server disk space 5Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics memory limit 10Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics sampling interval 30 Seconds
Elastic search disk space 10Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Elastic search memory limit 2Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Prometheus sampling interval 30s Seconds

Medium - for small teams, the installation default

Minimum hardware is single virtual machine, 8 cpus, 32GB memory and 100G disk space. Recommend cloud infrastructure configured for cluster scaling to absorb variable demand. A useful cluster scaling configuration is a minimum of 1 virtual machine and maximum of 5 virtual machines, although experience shows 2 servers usually sufficient.

Additional windows virtual machines can be added to the cluster to support statistica scoring if required.

The helm install option, –set global.size=medium, defaults to :

Feature Value Units
Scoring flow memory limit 2Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Scoring flow cpu limit 4 See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus disk space 20Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus memory limit 1.5Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Git server disk space 20Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server disk space 20Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics memory limit 15Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics sampling interval 10 Seconds
Elastic search disk space 50Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Elastic search memory limit 5Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Prometheus sampling interval 10s Seconds

Large - for larger teams

Minimum hardware is single virtual machine, 16 cpus, 64GB memory and 500G disk space. Recommend cloud infrastructure configured for cluster scaling to absorb variable demand. A useful cluster scaling configuration is a minimum of 1 virtual machine and maximum of 10 virtual machines.

Additional windows virtual machines can be added to the cluster to support statistica scoring if required.

The helm install option, –set global.size=large, defaults to :

Feature Value Units
Scoring flow memory limit 4Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Scoring flow cpu limit 6 See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus disk space 20Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus memory limit 1.5Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Git server disk space 100Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server disk space 100Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics memory limit 20Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics sampling interval 10 Seconds
Elastic search disk space 100Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Elastic search memory limit 10Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Prometheus sampling interval 10s Seconds

Further sizing customizations

Each of the above values can be overridden as needed, for example to increase the git server disk space with the medium configuration, use –set global.size=medium –set global.medium.git.disk=50Gi.

The installation values file listed below shows the actual customization names.

Passwords and secrets

In order to avoid clear text passwords, Kubenertes provides a Secrets facility. So prior to installation, kubernetes Secrets have to be created to contain the passwords required by modelops.

These are :

Description Secret name Key name Comments
Elastic search elasticsearch-es-elastic-user Elastic search user name See https://www.elastic.co/guide/en/cloud-on-k8s/master/k8s-users-and-roles.html - if not set elastic search generates a password
Git server git-server git user name
Nexus server nexus-server admin
Modelops server modelops-server admin
Grafana server grafana-server admin
Scoring flow admin scoring-admin admin

These secrets may be created via the cloud infrastructure or on the command-line using kubectl. For example :

  1. # elastic search
  2. #
  3. # note in this case we use apply to avoid elastic search re-creating the secret
  4. #
  5. kubectl create secret generic elasticsearch-es-elastic-user --from-literal=elastic=mysecretpassword --namespace modelops --dry-run=client --output=yaml 2>/dev/null > secret.yaml
  6. kubectl apply --filename secret.yaml
  7.  
  8. # git server
  9. #
  10. kubectl create secret generic git-server --from-literal=modelops=mysecretpassword --namespace modelops
  11.  
  12. # nexus server
  13. #
  14. kubectl create secret generic nexus-server --from-literal=admin=mysecretpassword --namespace modelops
  15.  
  16. # modelops server
  17. #
  18. kubectl create secret generic modelops-server --from-literal=admin=mysecretpassword --namespace modelops
  19.  
  20. # grafana server
  21. #
  22. kubectl create secret generic grafana-server --from-literal=admin=mysecretpassword --namespace modelops
  23.  
  24. # scoring admin
  25. #
  26. kubectl create secret generic scoring-admin --from-literal=admin=mysecretpassword --namespace modelops

It is recommended to install an encryption provider for maximum security - see https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/.

Quick run through

The Helm CLI tool is used to install the ModelOps components to Kubernetes :

  1. $ helm upgrade --install modelops modelops-1.0.0.tgz --atomic --set global.cloud=aks

This command first installs required Kubernetes operators - this takes a few seconds after which the helm command returns with a summary of the installation.

For example :

  1. NAME: modelops
  2. LAST DEPLOYED: Thu Oct 29 10:22:35 2020
  3. NAMESPACE: modelops
  4. STATUS: deployed
  5. REVISION: 2
  6. TEST SUITE: None
  7. NOTES:
  8. Thank you for installing modelops configured for aks in kubernetes v1.17.11
  9.  
  10.  
  11. The installation pipeline has started which includes :
  12.  
  13. Building tools image
  14. Building git server image
  15. Building sbrt base image
  16. Building modelops server image
  17. Building modelops scoring flow image
  18. Adding kubernetes permissions
  19. The following sub-charts :
  20. data-channel
  21. modelops-server
  22. pmml
  23. python
  24. sbrt-base
  25. scheduling-server
  26. scoring-flow
  27. statistica
  28. tensorflow
  29. test-datasink
  30. test-datasource
  31. tools
  32.  
  33.  
  34. Starting the git server at :
  35.  
  36. Internal URL - http://git:3000/
  37. External web console URL - run kubectl get service git --namespace modelops -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].targetPort}'
  38.  
  39. Starting the nexus server at :
  40.  
  41. Internal web console URL - http://nexus:8081/
  42. Maven repository - http://nexus:8081/repository/maven-public/
  43. Helm repository - http://nexus:8081/repository/helm/
  44. PyPi proxy - http://nexus:8081/repository/pypi-group
  45. Container registry - container:8082
  46. External web console URL - run kubectl get service nexuslb --namespace modelops -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].targetPort}'
  47.  
  48. Starting the modelops server at :
  49.  
  50. Internal URL - http://modelops-server/
  51. External URL - run kubectl get service modelops-server --namespace modelops -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].targetPort}'
  52.  
  53. Starting the data channel server at :
  54.  
  55. Internal URL - http://data-channel/
  56.  
  57. Starting the scheduling server at :
  58.  
  59. Internal URL - http://scheduling-server/
  60. External web console URL - run kubectl get service scheduling-server --namespace modelops -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].targetPort}'
  61.  
  62. Starting prometheus server at :
  63.  
  64. Internal URL - http://prometheus.modelops.svc.cluster.local:9090
  65. External URL - run kubectl get service prometheus --namespace modelops -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].targetPort}'
  66.  
  67. Starting grafana server at :
  68.  
  69. Internal URL - http://grafana:3000
  70. External URL - run kubectl get service grafana --namespace modelops -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].targetPort}'
  71.  
  72. Starting elasticsearch server at :
  73.  
  74. Internal URL - http://elasticsearch-es-http:9200
  75. External URL - run kubectl get service elasticsearch-es-http --namespace modelops -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].targetPort}'
  76.  
  77. Userid is elastic, password elastic
  78.  
  79. Starting kibana server at :
  80.  
  81. Internal URL - http://kibana-kb-http
  82. External URL - run kubectl get service kibana-kb-http --namespace modelops -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].targetPort}'
  83.  
  84. Userid is elastic, password elastic
  85.  
  86.  
  87.  
  88. Populating the nexus maven repository with TIBCO artifacts
  89.  
  90. To track the progress of the installation pipeline run :
  91.  
  92. tkn pipelinerun logs installation-2 --follow --namespace modelops

The output depends on the cloud platform and any additional options selected. These details are also displayed with the helm status modelops command.

The zip of maven artifacts should be copied using kubectl cp command :

  1. $ kubectl cp kubernetes-installer-1.0.0-mavenrepo.zip mavenrepo-0:/tmp/mavenrepo.zip

At this point the installation has been started and, as mentioned above, the status of the installation can be monitored with tkn pipelinerun logs installation -f. For example :

  1. $ tkn pipelinerun logs installation-2 --follow --namespace modelops
  2. [tools-prepare : prepare] Preparing directory for build tools image
  3.  
  4. [git-server : build-and-push] INFO[0005] Retrieving image manifest gitea/gitea:1.10.2
  5. [git-server : build-and-push] INFO[0007] Retrieving image manifest gitea/gitea:1.10.2
  6. [git-server : build-and-push] INFO[0010] Built cross stage deps: map[]
  7. [git-server : build-and-push] INFO[0010] Retrieving image manifest gitea/gitea:1.10.2
  8. [git-server : build-and-push] INFO[0011] Retrieving image manifest gitea/gitea:1.10.2
  9. ....

The installation process can run tasks in parallel - hence the output is prefixed with the task and lines are coloured.

The installation is completed when the tkn pipelinerun logs installation –follow –namespace modelops command completes. The tkn taskrun list command shows the task status :

  1. $ tkn taskrun list --namespace modelops
  2. NAME TASK NAME STARTED DURATION STATUS
  3. installation-2-modelops-server-scale-tq26d modelops-server-scale 9 minutes ago 26 seconds Succeeded
  4. installation-2-test-datasink-scale-7x6dv test-datasink-scale 12 minutes ago 43 seconds Succeeded
  5. installation-2-test-datasource-scale-n2bn4 test-datasource-scale 12 minutes ago 27 seconds Succeeded
  6. installation-2-modelops-server-image-n8pc8 modelops-server-image 15 minutes ago 5 minutes Succeeded
  7. installation-2-scoring-flow-image-dt5d6 scoring-flow-image 15 minutes ago 6 minutes Succeeded
  8. installation-2-test-datasource-image-ldxrk test-datasource-image 15 minutes ago 2 minutes Succeeded
  9. installation-2-test-datasink-image-45thd test-datasink-image 15 minutes ago 3 minutes Succeeded
  10. installation-2-git-server-scale-pbclk git-server-scale 16 minutes ago 21 seconds Succeeded
  11. installation-2-modelops-server-maven-kc4s4 modelops-server-maven 16 minutes ago 1 minute Succeeded
  12. installation-2-test-datasource-maven-m7ghv test-datasource-maven 16 minutes ago 1 minute Succeeded
  13. installation-2-test-datasink-maven-z9pd6 test-datasink-maven 16 minutes ago 1 minute Succeeded
  14. installation-2-scoring-flow-maven-znt2n scoring-flow-maven 16 minutes ago 1 minute Succeeded
  15. installation-2-git-server-qq8dl git-server 17 minutes ago 1 minute Succeeded
  16. installation-2-test-datasource-prepare-lt6xd test-datasource-prepare 17 minutes ago 53 seconds Succeeded
  17. installation-2-test-datasink-prepare-4bmzj test-datasink-prepare 17 minutes ago 51 seconds Succeeded
  18. installation-2-scoring-flow-prepare-tmh55 scoring-flow-prepare 17 minutes ago 51 seconds Succeeded
  19. installation-2-modelops-server-prepare-rm6d5 modelops-server-prepare 17 minutes ago 57 seconds Succeeded
  20. installation-2-git-server-prepare-269qk git-server-prepare 19 minutes ago 1 minute Succeeded
  21. installation-2-data-channel-scale-5792c data-channel-scale 19 minutes ago 1 minute Succeeded
  22. installation-2-kibana-gzjr4 kibana 19 minutes ago 1 minute Succeeded
  23. installation-2-scoring-pipeline-helm-qcx7d scoring-pipeline-helm 19 minutes ago 1 minute Succeeded
  24. installation-2-scheduling-server-scale-m5c5q scheduling-server-scale 20 minutes ago 1 minute Succeeded
  25. installation-2-python-image-j5l5q python-image 25 minutes ago 7 minutes Succeeded
  26. installation-2-tools-image-ddh4r tools-image 25 minutes ago 6 minutes Succeeded
  27. installation-2-statistica-image-mq5cq statistica-image 25 minutes ago 24 minutes Succeeded
  28. installation-2-pmml-image-sw2s6 pmml-image 25 minutes ago 6 minutes Succeeded
  29. installation-2-data-channel-image-r7hs8 data-channel-image 26 minutes ago 7 minutes Succeeded
  30. installation-2-scheduling-server-image-df9jh scheduling-server-image 26 minutes ago 6 minutes Succeeded
  31. installation-2-tensorflow-image-w6ssk tensorflow-image 26 minutes ago 6 minutes Succeeded
  32. installation-2-sbrt-base-image-xk6hg sbrt-base-image 26 minutes ago 8 minutes Succeeded
  33. installation-2-python-maven-ffsbw python-maven 26 minutes ago 1 minute Succeeded
  34. installation-2-statistica-maven-vkxvd statistica-maven 26 minutes ago 59 seconds Succeeded
  35. installation-2-pmml-maven-nvz42 pmml-maven 26 minutes ago 1 minute Succeeded
  36. installation-2-tools-maven-8bm4r tools-maven 27 minutes ago 1 minute Succeeded
  37. installation-2-scheduling-server-maven-kf2br scheduling-server-maven 27 minutes ago 1 minute Succeeded
  38. installation-2-sbrt-base-maven-hdh67 sbrt-base-maven 27 minutes ago 1 minute Succeeded
  39. installation-2-data-channel-maven-hwb7t data-channel-maven 27 minutes ago 1 minute Succeeded
  40. installation-2-tensorflow-maven-m4svb tensorflow-maven 27 minutes ago 1 minute Succeeded
  41. installation-2-pmml-prepare-b8mqn pmml-prepare 28 minutes ago 1 minute Succeeded
  42. installation-2-data-channel-prepare-77sk2 data-channel-prepare 28 minutes ago 47 seconds Succeeded
  43. installation-2-tools-prepare-t4fgk tools-prepare 28 minutes ago 53 seconds Succeeded
  44. installation-2-statistica-prepare-prhn9 statistica-prepare 28 minutes ago 1 minute Succeeded
  45. installation-2-tensorflow-prepare-kxfm6 tensorflow-prepare 28 minutes ago 48 seconds Succeeded
  46. installation-2-scheduling-server-prepare-lfn8l scheduling-server-prepare 28 minutes ago 54 seconds Succeeded
  47. installation-2-python-prepare-5bwlm python-prepare 28 minutes ago 2 minutes Succeeded
  48. installation-2-sbrt-base-prepare-b94jj sbrt-base-prepare 28 minutes ago 54 seconds Succeeded
  49. installation-2-deploy-artifacts-fqckl deploy-artifacts 33 minutes ago 4 minutes Succeeded
  50. installation-2-nexus-repositories-7qvfx nexus-repositories 33 minutes ago 12 seconds Succeeded

The installation pipeline

The install process is controlled via a Tekton pipeline called installation. This pipeline first installs the following Kubernetes Operators during the pre-install hook :

Kubernetes permissions are added to support Role-based access control (RBAC), security context constraints (SCC) and Streaming discovery.

The following container images are built in Kubernetes :

  • GIT server image - used to hold the ModelOps artifacts

Dependent helm sub charts also create :

  • General purpose tools image - used for various build and deploy tasks
  • TIBCO Streaming runtime base image - used as a base for further images
  • TIBCO ModelOps Server image - scoring pipeline, flow, and model management
  • TIBCO ModelOps Scoring Server image for for each runner - model scoring
  • TIBCO Data Channel Registry - data source and sink registration and discovery
  • TIBCO ModelOps Scheduling Server - job scheduling

The following services are started :

  • GIT server
  • TIBCO ModelOps Server
  • Nexus repository configured with :
    • Maven repository, populated with TIBCO artifacts
    • Python repository (both proxy and hosted)
    • Container registry
    • Helm chart repository
  • TIBCO Data Channel Registry
  • TIBCO Scheduling Server

Finally the installation deploys a helm chart used to later deploy a ModelOps server.

Kubernetes rollout is paused during the installation process and resumed once new container images are available.

Individual pipeline tasks are scheduled by dependency and available resources.

Old pipeline runs left over from earlier upgrades are cleaned up so that the logs for the last 3 installations only are kept.

installation pipeline

Cloud platform differences

Kubernetes features differs between platforms and so the installation process also varies slightly. In general, natively provided features are used in preference to custom provided features. These difference are shown below :

Feature Docker for desktop Kind OpenShift AKS EKS Nutanix
Operator Lifecycle Manager Installed Installed Provided Installed TBD TBD
Container registry Nexus Kind ImageStream ACR TBD TBD
Network exposure node port node port route load balancer TBD TBD
RBAC supported No Yes Yes Yes TBD TBD
SCC supported No No Yes No TBD TBD
Windows images supported No No No Yes TBD TBD

These differences are controlled via ModelOps helm chart values parameters - these can be viewed with the helm show values modelops-1.0.0.tgz command, for example :

  1. $ helm show values modelops-1.0.0.tgz
  2. #
  3. # Default values for the chart
  4. #
  5.  
  6. #
  7. # declare as global so subcharts get the same
  8. #
  9. global:
  10.  
  11. #
  12. # empty map for sub charts to populate to extend this chart
  13. #
  14. buildsteps: {}
  15. supportedmodels: {}
  16. runafter: {}
  17.  
  18. #
  19. # cloud environment
  20. #
  21. cloud: docker-for-desktop
  22.  
  23. #
  24. # Container timezone
  25. #
  26. timeZone: Europe/London
  27.  
  28. #
  29. # nexus specific settings
  30. #
  31. nexus:
  32. nodePort: 30020
  33. containerNodePort: 30030
  34. adminPassword: "admin123"
  35. hostname: "nexus"
  36.  
  37. #
  38. # The following values are defaulted depending on cloud type :
  39. #
  40. # installOLM - install the operator lifecycle manager
  41. #
  42. # containerRegistry - base URI of container registry. Use the supplied one
  43. # if available.
  44. #
  45. # containerUsername/containerPassword - if set, used to access container registry
  46. #
  47. # networkExposure - mechanism to use to expose network
  48. #
  49. # createPVC - if true create persistent volume claim in helm chart, if false
  50. # the persistent volume claim must be created before installing the chart.
  51. #
  52. # selfSignedRegistry - if true then skip tls verification on registry
  53. #
  54. # httpRegistry - if true then use http registry
  55. #
  56. # adminRBAC - if true, create and use admin service account for admin tasks
  57. #
  58. # kubernetesRBAC - if true, create role for streaming pod to manage service (plain kubernetes)
  59. #
  60. # openshiftRBAC - if true, create role for streaming pod to manage service (openshift)
  61. #
  62. # openshiftOperator - if true, use the OpenShift operator hub
  63. #
  64. # windows - if true build windows container (currently statistica scoring server)
  65. #
  66. # dnsSuffix - AKS only, set azure annotation for pubic dns name, ie <container>-<dnsSuffix>.<region>.cloudapp.azure.com
  67. #
  68.  
  69. docker-for-desktop:
  70. installOLM: true
  71. installMetrics: true
  72. installLogs: true
  73. containerRegistry: "localhost:5000"
  74. networkExposure: "nodePort"
  75. createPVC: true
  76. httpRegistry: true
  77. selfSignedRegistry: false
  78. adminRBAC: false
  79. kubernetesRBAC: false
  80. openshiftRBAC: false
  81. openshiftOperator: false
  82. windows: false
  83.  
  84. kind:
  85. installOLM: true
  86. installMetrics: true
  87. installLogs: true
  88. containerRegistry: "kind-registry:5000"
  89. networkExposure: "nodePort"
  90. createPVC: true
  91. selfSignedRegistry: false
  92. httpRegistry: true
  93. adminRBAC: true
  94. kubernetesRBAC: true
  95. openshiftRBAC: false
  96. openshiftOperator: false
  97. windows: false
  98.  
  99. openshift:
  100. installOLM: false
  101. installMetrics: true
  102. installLogs: true
  103. containerRegistry: "image-registry.openshift-image-registry.svc:5000/default"
  104. networkExposure: "route"
  105. createPVC: true
  106. selfSignedRegistry: true
  107. httpRegistry: false
  108. adminRBAC: true
  109. kubernetesRBAC: false
  110. openshiftRBAC: true
  111. openshiftOperator: true
  112. windows: false
  113.  
  114. aks:
  115. installOLM: true
  116. installMetrics: true
  117. installLogs: true
  118. containerRegistry: "myregistry.azurecr.io"
  119. containerUsername: "azure appid"
  120. containerPassword: "azure password"
  121. azureTenantId: "azure tenantId"
  122. networkExposure: "loadBalancer"
  123. domain: "tobeset"
  124. createPVC: false
  125. selfSignedRegistry: false
  126. httpRegistry: true
  127. adminRBAC: true
  128. kubernetesRBAC: true
  129. openshiftRBAC: false
  130. openshiftOperator: false
  131. windows: true
  132.  
  133. #
  134. # sizing details
  135. #
  136. small:
  137. nexus:
  138. disk: "20Gi"
  139. memory: "1.5Gi"
  140. git:
  141. disk: "5Gi"
  142. modelopsserver:
  143. disk: "5Gi"
  144. modelopsmetrics:
  145. memory: "10Gi"
  146. interval: "30"
  147. elasticsearch:
  148. disk: "5Gi"
  149. memory: "2Gi"
  150. prometheus:
  151. interval: "30s"
  152.  
  153. medium:
  154. nexus:
  155. disk: "20Gi"
  156. memory: "1.5Gi"
  157. git:
  158. disk: "20Gi"
  159. modelopsserver:
  160. disk: "20Gi"
  161. modelopsmetrics:
  162. memory: "15Gi"
  163. interval: "10"
  164. elasticsearch:
  165. disk: "20Gi"
  166. memory: "5Gi"
  167. prometheus:
  168. interval: "10s"
  169.  
  170. large:
  171. nexus:
  172. disk: "20Gi"
  173. memory: "1.5Gi"
  174. git:
  175. disk: "100Gi"
  176. modelops-erver:
  177. disk: "100Gi"
  178. modelopsmetrics:
  179. memory: "20Gi"
  180. interval: "10"
  181. elasticsearch:
  182. disk: "100Gi"
  183. memory: "10Gi"
  184. prometheus:
  185. interval: "10s"
  186. #
  187. # hence the chart may be installed :
  188. #
  189. # helm install modelops target/helm/repo/modelops-1.0.0.tgz --set cloud=openshift
  190. #
  191. # or override individual settings
  192. #
  193. # helm install modelops target/helm/repo/modelops-1.0.0.tgz --set cloud=openshift --set openshift.createPVC=true
  194. #
  195.  
  196. #
  197. # auto start deployments ( after image is built )
  198. #
  199. autostartdeployments:
  200. tools: false
  201. statistica: false
  202. pmml: false
  203. tensorflow: false
  204. scoring-flow: false
  205. data-channel: true
  206. scheduling-server: true
  207. modelops-server: true
  208.  
  209. #
  210. #
  211. # git specific settings
  212. #
  213. # if azureDiskURL is set, use azureDisk with that URL
  214. #
  215. git:
  216. nodePort: 30010
  217. username: "modelops"
  218. password: "modelops"
  219. repository: "scoringpipelines"
  220. azureDiskURL: ""
  221.  
  222. #
  223. # modelops-server specific settings
  224. #
  225. # if azureDiskURL is set, use azureDisk with that URL
  226. #
  227. modelopsserver:
  228. nodePort: 30040
  229. username: "admin"
  230. password: "admin"
  231. azureDiskURL: ""
  232.  
  233. #
  234. # prometheus specific settings
  235. #
  236. prometheus:
  237. nodePort: 30050
  238.  
  239. #
  240. # grafana specific settings
  241. #
  242. grafana:
  243. nodePort: 30060
  244.  
  245. #
  246. # elasiticsearch specific settings
  247. #
  248. elasticsearch:
  249. nodePort: 30070
  250. password: "elastic"
  251.  
  252. #
  253. # kibana specific settings
  254. #
  255. kibana:
  256. nodePort: 30080
  257.  
  258. #
  259. # scheduling-server specific settings
  260. #
  261. schedulingserver:
  262. nodePort: 30090
  263. logLevel: "INFO"
  264.  
  265. #
  266. # data channel specific settings
  267. #
  268. #
  269. datachannel:
  270. nodePort: 30100

So to choose the defaults for a given environment, just set global.cloud to the right environment :

  1. $ helm install modelops modelops-1.0.0.tgz --set global.cloud=kind

However individual settings can be overridden if required, using cloud name.parameter format. For example :

  1. $ helm install modelops modelops-1.0.0.tgz --set global.cloud=docker-for-desktop \
  2. --set global.docker-for-desktop.containerRegistry=myserver:30030

Some examples are shown in the sections below :

FIX THIS - I wondered if we need to be more specific about creating these environments ?

Docker for Desktop

To install docker for desktop :

  1. Follow the instructions at https://www.docker.com/products/docker-desktop
  2. Enable Kubernetes in the GUI
  3. Ensure there are sufficient resources allocated

ModelOps is installed using these commands:

  1. //
  2. // Install ModelOps helm charts
  3. //
  4. $ cd ${TIBCO_EP_HOME}/ep-modelops/kubernetes-installer/target/helm/repo
  5. $ helm install modelops \
  6. modelops-1.0.0.tgz \
  7. --set global.cloud=docker-for-desktop \
  8. --set global.docker-for-desktop.containerRegistry=$(hostname -s):30030
  9. //
  10. // Populate the Maven repository inside the K8s cluster
  11. //
  12. $ cd ${TIBCO_EP_HOME}/ep-modelops/kubernetes-installer/target/
  13. $ kubectl cp kubernetes-installer-1.0.0-mavenrepo.zip \
  14. mavenrepo-0:/tmp/mavenrepo.zip

Kind

To install kind :

  1. Install docker for desktop as above
  2. Follow the instructions at https://kind.sigs.k8s.io/docs/user/quick-start/
  3. Follow the instructions at https://kind.sigs.k8s.io/docs/user/local-registry/ to configure the registry

Once installed, a typical ModelOps installation command for kind is :

  1. $ helm install modelops modelops-1.0.0.tgz --set global.cloud=kind
  2. $ kubectl cp kubernetes-installer-1.0.0-mavenrepo.zip mavenrepo-0:/tmp/mavenrepo.zip

OpenShift Code Ready Containers (CRC)

To install code ready containers :

  1. Follow the instructions at https://cloud.redhat.com/openshift/install/crc/installer-provisioned
  2. Start with crc start –cpus 6 –memory 16384 –pull-secret-file ~/pull-secret.txt

Then a typical ModelOps installation command for OpenShift code ready containers is :

  1. $ helm install modelops modelops-1.0.0.tgz --set global.cloud=openshift
  2. $ kubectl cp kubernetes-installer-1.0.0-mavenrepo.zip mavenrepo-0:/tmp/mavenrepo.zip

AKS

To install AKS :

  1. Follow the instructions at https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough
  2. To support windows containers, follow the instructions at https://docs.microsoft.com/en-us/azure/aks/windows-container-cli#add-a-windows-server-node-pool

An example AKS script is :

  1. namespace=modelops
  2. group=modelops
  3. cluster=modelops
  4. acr=registry
  5. azure_winpassword="set this"
  6. azure_appId="set this"
  7. azure_password="set this""
  8. azure_tenantId="set this"
  9.  
  10. #
  11. # create cluster
  12. #
  13. az group create --name ${group} --location uksouth
  14. az acr create --name ${acr} --resource-group ${group} --sku basic
  15. az aks create \
  16. --resource-group ${group} \
  17. --service-principal ${azure_appId} \
  18. --client-secret ${azure_password} \
  19. --name ${cluster} \
  20. --node-count 1 \
  21. --enable-cluster-autoscaler \
  22. --min-count 1 \
  23. --max-count 5 \
  24. --no-ssh-key \
  25. --windows-admin-password ${azure_winpassword} \
  26. --windows-admin-username azureuser \
  27. --vm-set-type VirtualMachineScaleSets \
  28. --node-vm-size Standard_B8ms \
  29. --network-plugin azure \
  30. --attach-acr ${acr}
  31.  
  32. #
  33. # add windows pool and set taint to avoid default use
  34. # (some operators will attempt to schedule linux pod on windows node)
  35. #
  36. az aks nodepool add \
  37. --resource-group ${group} \
  38. --cluster-name ${cluster} \
  39. --os-type Windows \
  40. --name npwin \
  41. --node-count 1 \
  42. --enable-cluster-autoscaler \
  43. --min-count 1 \
  44. --max-count 2 \
  45. --node-vm-size Standard_B8ms \
  46. --node-taints os=windows:NoSchedule
  47.  
  48. #
  49. # create any persistent volumes ( external to kubernetes / namespace )
  50. #
  51. nodegroup=$(az aks show --resource-group ${group} --name ${cluster} --query nodeResourceGroup -o tsv)
  52. az disk create \
  53. --resource-group ${nodegroup} \
  54. --name modelops-server \
  55. --size-gb 5 \
  56. --query id --output tsv
  57. az disk create \
  58. --resource-group ${nodegroup} \
  59. --name git-server \
  60. --size-gb 5 \
  61. --query id --output tsv

Once installed, a typical ModelOps installation command for AKS is :

  1. $ helm install modelops modelops-1.0.0.tgz --atomic --set global.cloud=aks \
  2. --set global.aks.containerRegistry=${acr}.azurecr.io \
  3. --set global.aks.containerUsername=${azure_appId} \
  4. --set global.aks.containerPassword=${azure_password} \
  5. --set global.aks.azureTenantId=${azure_tenantId}
  6. $ kubectl cp kubernetes-installer-1.0.0-mavenrepo.zip mavenrepo-0:/tmp/mavenrepo.zip

Note that the chosen Azure registry URL and authentication details must be provided.

Upgrading

To upgrade the ModelOps components use :

  1. $ helm upgrade modelops modelops-1.0.0.tgz ...

However, its common practice to use the same command for installation and upgrades :

  1. $ helm upgrade modelops modelops-1.0.0.tgz --install ...

When the installation is upgraded the installation pipeline is re-executed and a rollout restart is performed on existing pods.

Uninstalling

To uninstall the ModelOps components use:

  1. $ helm uninstall modelops

Note that this doesn’t uninstall the Kubernetes operators (so that a further install is faster).

To uninstall everything to start from scratch reset the Kubernetes cluster, for example in Docker for Desktop:

Reset Kubernetes Cluster

Troubleshooting

Always ensure the kubernetes context is what you expect. For example with docker for desktop :

  1. $ kubectl config current-context
  2. docker-desktop

The context is also displayed in docker for desktop UI.