Cloud Installation

Introduction

This section provides general information on deploying ModelOps to a Kubernetes cluster.

Requirements

The following tools are required to complete the installation - these must be downloaded and installed prior to installing ModelOps :

Additional requirements depend on the Kubernetes cloud platform being used:

Optional Tools

These tools are optional, but have been found to be useful

  • Lens
    • macOS: brew install lens

Docker for desktop also requires:

  • Kubernetes must be enabled Kubernetes Preference
  • Docker Engine must enable insecure registries for the local machine Enable Insecure Registries Preference
  • Allocate required resources, roughly Required Resources

Kind

Kind has no additional requirements.

Azure AKS

Azure AKS also requires:

  • Azure CLI Tools must be installed and configured.
    • macOS: brew install azure-cli

OpenShift

OpenShift also requires:

Sizing

Modelops can be quickly configured for a small, medium or large installation whilst also allowing for further customizations as needed.

Whilst many values are best kept at defaults, the values listed below have biggest effect on sizing and so are exposed as install options.

Small - for single scoring

Minimum hardware is single virtual machine, 6 cores, 30GB memory and 100G disk space. Large configurations will allow for more concurrent scoring and faster executions. Can be run in desktop kubernetes environments such as docker for desktop.

Linux based scoring only ( a hybrid deployment of linux and windows nodes is required for statistica scoring ).

The helm install option, –set global.size=small, defaults to :

Feature Value Units
Scoring flow memory limit 1.5Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Scoring flow cpu limit 2 See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus memory limit 1.5Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus disk space 20Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus memory limit 1.5Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Git server disk space 5Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server disk space 5Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics memory limit 10Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics sampling interval 30 Seconds
Elastic search disk space 10Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Elastic search memory limit 2Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Prometheus sampling interval 30s Seconds

Medium - for small teams, the installation default

Minimum hardware is single virtual machine, 8 cpus, 32GB memory and 100G disk space. Recommend cloud infrastructure configured for cluster scaling to absorb variable demand. A useful cluster scaling configuration is a minimum of 1 virtual machine and maximum of 5 virtual machines, although experience shows 2 servers usually sufficient.

Additional windows virtual machines can be added to the cluster to support statistica scoring if required.

The helm install option, –set global.size=medium, defaults to :

Feature Value Units
Scoring flow memory limit 2Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Scoring flow cpu limit 4 See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus disk space 20Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus memory limit 1.5Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Git server disk space 20Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server disk space 20Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics memory limit 15Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics sampling interval 10 Seconds
Elastic search disk space 50Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Elastic search memory limit 5Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Prometheus sampling interval 10s Seconds

Large - for larger teams

Minimum hardware is single virtual machine, 16 cpus, 64GB memory and 500G disk space. Recommend cloud infrastructure configured for cluster scaling to absorb variable demand. A useful cluster scaling configuration is a minimum of 1 virtual machine and maximum of 10 virtual machines.

Additional windows virtual machines can be added to the cluster to support statistica scoring if required.

The helm install option, –set global.size=large, defaults to :

Feature Value Units
Scoring flow memory limit 4Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Scoring flow cpu limit 6 See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus disk space 20Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus memory limit 1.5Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Git server disk space 100Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server disk space 100Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics memory limit 20Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics sampling interval 10 Seconds
Elastic search disk space 100Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Elastic search memory limit 10Gi See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Prometheus sampling interval 10s Seconds

Further sizing customizations

Each of the above values can be overridden as needed, for example to increase the git server disk space with the medium configuration, use –set global.size=medium –set global.medium.git.disk=50Gi.

The installation values file listed below shows the actual customization names.

Passwords and secrets

In order to avoid clear text passwords, Kubenertes provides a Secrets facility. So prior to installation, kubernetes Secrets have to be created to contain the passwords required by modelops.

These are :

Description Secret name Key name Comments
Elastic search elasticsearch-es-elastic-user Elastic search user name See https://www.elastic.co/guide/en/cloud-on-k8s/master/k8s-users-and-roles.html - if not set elastic search generates a password
Git server git-server git user name
Nexus server nexus-server admin
Modelops server modelops-server admin
Grafana server grafana-server admin
Scoring flow admin scoring-admin admin

These secrets may be created via the cloud infrastructure or on the command-line using kubectl. For example :

# elastic search
#
# note in this case we use apply to avoid elastic search re-creating the secret
#
kubectl create secret generic elasticsearch-es-elastic-user --from-literal=elastic=mysecretpassword --namespace modelops --dry-run=client --output=yaml 2>/dev/null > secret.yaml
kubectl apply --filename secret.yaml

# git server
#
kubectl create secret generic git-server --from-literal=modelops=mysecretpassword --namespace modelops

# nexus server
#
kubectl create secret generic nexus-server --from-literal=admin=mysecretpassword --namespace modelops

# modelops server
#
kubectl create secret generic modelops-server --from-literal=admin=mysecretpassword --namespace modelops

# grafana server
#
kubectl create secret generic grafana-server --from-literal=admin=mysecretpassword --namespace modelops

# scoring admin
#
kubectl create secret generic scoring-admin --from-literal=admin=mysecretpassword --namespace modelops

It is recommended to install an encryption provider for maximum security - see https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/.

Quick run through

The Helm CLI tool is used to install the ModelOps components to Kubernetes :

$ helm upgrade --install modelops modelops-1.0.0.tgz --atomic --set global.cloud=aks

This command first installs required Kubernetes operators - this takes a few seconds after which the helm command returns with a summary of the installation.

For example :

NAME: modelops
LAST DEPLOYED: Thu Oct 29 10:22:35 2020
NAMESPACE: modelops
STATUS: deployed
REVISION: 2
TEST SUITE: None
NOTES:
Thank you for installing modelops configured for aks in kubernetes v1.17.11


The installation pipeline has started which includes :

  Building tools image
  Building git server image
  Building sbrt base image
  Building modelops server image
  Building modelops scoring flow image
  Adding kubernetes permissions
  The following sub-charts :
    data-channel
    modelops-server
    pmml
    python
    sbrt-base
    scheduling-server
    scoring-flow
    statistica
    tensorflow
    test-datasink
    test-datasource
    tools


  Starting the git server at :

    Internal URL - http://git:3000/
    External web console URL - run kubectl get service git --namespace modelops -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].targetPort}'

  Starting the nexus server at :

    Internal web console URL - http://nexus:8081/
      Maven repository - http://nexus:8081/repository/maven-public/
      Helm repository - http://nexus:8081/repository/helm/
      PyPi proxy - http://nexus:8081/repository/pypi-group
      Container registry - container:8082
    External web console URL - run kubectl get service nexuslb --namespace modelops -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].targetPort}'

  Starting the modelops server at :

    Internal URL - http://modelops-server/
    External URL - run kubectl get service modelops-server --namespace modelops -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].targetPort}'

  Starting the data channel server at :

    Internal URL - http://data-channel/

  Starting the scheduling server at :

    Internal URL - http://scheduling-server/
    External web console URL - run kubectl get service scheduling-server --namespace modelops -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].targetPort}'

  Starting prometheus server at :

    Internal URL - http://prometheus.modelops.svc.cluster.local:9090
    External URL - run kubectl get service prometheus --namespace modelops -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].targetPort}'

  Starting grafana server at :

    Internal URL - http://grafana:3000
    External URL - run kubectl get service grafana --namespace modelops -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].targetPort}'

  Starting elasticsearch server at :

    Internal URL - http://elasticsearch-es-http:9200
    External URL - run kubectl get service elasticsearch-es-http --namespace modelops -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].targetPort}'

    Userid is elastic, password elastic

  Starting kibana server at :

    Internal URL - http://kibana-kb-http
    External URL - run kubectl get service kibana-kb-http --namespace modelops -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].targetPort}'

    Userid is elastic, password elastic



  Populating the nexus maven repository with TIBCO artifacts

To track the progress of the installation pipeline run :

  tkn pipelinerun logs installation-2 --follow --namespace modelops

The output depends on the cloud platform and any additional options selected. These details are also displayed with the helm status modelops command.

The zip of maven artifacts should be copied using kubectl cp command :

$ kubectl cp kubernetes-installer-1.0.0-mavenrepo.zip mavenrepo-0:/tmp/mavenrepo.zip

At this point the installation has been started and, as mentioned above, the status of the installation can be monitored with tkn pipelinerun logs installation -f. For example :

$ tkn pipelinerun logs installation-2 --follow --namespace modelops
[tools-prepare : prepare] Preparing directory for build tools image

[git-server : build-and-push] INFO[0005] Retrieving image manifest gitea/gitea:1.10.2
[git-server : build-and-push] INFO[0007] Retrieving image manifest gitea/gitea:1.10.2
[git-server : build-and-push] INFO[0010] Built cross stage deps: map[]
[git-server : build-and-push] INFO[0010] Retrieving image manifest gitea/gitea:1.10.2
[git-server : build-and-push] INFO[0011] Retrieving image manifest gitea/gitea:1.10.2
....

The installation process can run tasks in parallel - hence the output is prefixed with the task and lines are coloured.

The installation is completed when the tkn pipelinerun logs installation –follow –namespace modelops command completes. The tkn taskrun list command shows the task status :

$ tkn taskrun list --namespace modelops
 NAME                                             TASK NAME                   STARTED          DURATION     STATUS
 installation-2-modelops-server-scale-tq26d       modelops-server-scale       9 minutes ago    26 seconds   Succeeded
 installation-2-test-datasink-scale-7x6dv         test-datasink-scale         12 minutes ago   43 seconds   Succeeded
 installation-2-test-datasource-scale-n2bn4       test-datasource-scale       12 minutes ago   27 seconds   Succeeded
 installation-2-modelops-server-image-n8pc8       modelops-server-image       15 minutes ago   5 minutes    Succeeded
 installation-2-scoring-flow-image-dt5d6          scoring-flow-image          15 minutes ago   6 minutes    Succeeded
 installation-2-test-datasource-image-ldxrk       test-datasource-image       15 minutes ago   2 minutes    Succeeded
 installation-2-test-datasink-image-45thd         test-datasink-image         15 minutes ago   3 minutes    Succeeded
 installation-2-git-server-scale-pbclk            git-server-scale            16 minutes ago   21 seconds   Succeeded
 installation-2-modelops-server-maven-kc4s4       modelops-server-maven       16 minutes ago   1 minute     Succeeded
 installation-2-test-datasource-maven-m7ghv       test-datasource-maven       16 minutes ago   1 minute     Succeeded
 installation-2-test-datasink-maven-z9pd6         test-datasink-maven         16 minutes ago   1 minute     Succeeded
 installation-2-scoring-flow-maven-znt2n          scoring-flow-maven          16 minutes ago   1 minute     Succeeded
 installation-2-git-server-qq8dl                  git-server                  17 minutes ago   1 minute     Succeeded
 installation-2-test-datasource-prepare-lt6xd     test-datasource-prepare     17 minutes ago   53 seconds   Succeeded
 installation-2-test-datasink-prepare-4bmzj       test-datasink-prepare       17 minutes ago   51 seconds   Succeeded
 installation-2-scoring-flow-prepare-tmh55        scoring-flow-prepare        17 minutes ago   51 seconds   Succeeded
 installation-2-modelops-server-prepare-rm6d5     modelops-server-prepare     17 minutes ago   57 seconds   Succeeded
 installation-2-git-server-prepare-269qk          git-server-prepare          19 minutes ago   1 minute     Succeeded
 installation-2-data-channel-scale-5792c          data-channel-scale          19 minutes ago   1 minute     Succeeded
 installation-2-kibana-gzjr4                      kibana                      19 minutes ago   1 minute     Succeeded
 installation-2-scoring-pipeline-helm-qcx7d       scoring-pipeline-helm       19 minutes ago   1 minute     Succeeded
 installation-2-scheduling-server-scale-m5c5q     scheduling-server-scale     20 minutes ago   1 minute     Succeeded
 installation-2-python-image-j5l5q                python-image                25 minutes ago   7 minutes    Succeeded
 installation-2-tools-image-ddh4r                 tools-image                 25 minutes ago   6 minutes    Succeeded
 installation-2-statistica-image-mq5cq            statistica-image            25 minutes ago   24 minutes   Succeeded
 installation-2-pmml-image-sw2s6                  pmml-image                  25 minutes ago   6 minutes    Succeeded
 installation-2-data-channel-image-r7hs8          data-channel-image          26 minutes ago   7 minutes    Succeeded
 installation-2-scheduling-server-image-df9jh     scheduling-server-image     26 minutes ago   6 minutes    Succeeded
 installation-2-tensorflow-image-w6ssk            tensorflow-image            26 minutes ago   6 minutes    Succeeded
 installation-2-sbrt-base-image-xk6hg             sbrt-base-image             26 minutes ago   8 minutes    Succeeded
 installation-2-python-maven-ffsbw                python-maven                26 minutes ago   1 minute     Succeeded
 installation-2-statistica-maven-vkxvd            statistica-maven            26 minutes ago   59 seconds   Succeeded
 installation-2-pmml-maven-nvz42                  pmml-maven                  26 minutes ago   1 minute     Succeeded
 installation-2-tools-maven-8bm4r                 tools-maven                 27 minutes ago   1 minute     Succeeded
 installation-2-scheduling-server-maven-kf2br     scheduling-server-maven     27 minutes ago   1 minute     Succeeded
 installation-2-sbrt-base-maven-hdh67             sbrt-base-maven             27 minutes ago   1 minute     Succeeded
 installation-2-data-channel-maven-hwb7t          data-channel-maven          27 minutes ago   1 minute     Succeeded
 installation-2-tensorflow-maven-m4svb            tensorflow-maven            27 minutes ago   1 minute     Succeeded
 installation-2-pmml-prepare-b8mqn                pmml-prepare                28 minutes ago   1 minute     Succeeded
 installation-2-data-channel-prepare-77sk2        data-channel-prepare        28 minutes ago   47 seconds   Succeeded
 installation-2-tools-prepare-t4fgk               tools-prepare               28 minutes ago   53 seconds   Succeeded
 installation-2-statistica-prepare-prhn9          statistica-prepare          28 minutes ago   1 minute     Succeeded
 installation-2-tensorflow-prepare-kxfm6          tensorflow-prepare          28 minutes ago   48 seconds   Succeeded
 installation-2-scheduling-server-prepare-lfn8l   scheduling-server-prepare   28 minutes ago   54 seconds   Succeeded
 installation-2-python-prepare-5bwlm              python-prepare              28 minutes ago   2 minutes    Succeeded
 installation-2-sbrt-base-prepare-b94jj           sbrt-base-prepare           28 minutes ago   54 seconds   Succeeded
 installation-2-deploy-artifacts-fqckl            deploy-artifacts            33 minutes ago   4 minutes    Succeeded
 installation-2-nexus-repositories-7qvfx          nexus-repositories          33 minutes ago   12 seconds   Succeeded

The installation pipeline

The install process is controlled via a Tekton pipeline called installation. This pipeline first installs the following Kubernetes Operators during the pre-install hook :

Kubernetes permissions are added to support Role-based access control (RBAC), security context constraints (SCC) and Streaming discovery.

The following container images are built in Kubernetes :

  • GIT server image - used to hold the ModelOps artifacts

Dependent helm sub charts also create :

  • General purpose tools image - used for various build and deploy tasks
  • TIBCO Streaming runtime base image - used as a base for further images
  • TIBCO ModelOps Server image - scoring pipeline, flow, and model management
  • TIBCO ModelOps Scoring Server image for for each runner - model scoring
  • TIBCO Data Channel Registry - data source and sink registration and discovery
  • TIBCO ModelOps Scheduling Server - job scheduling

The following services are started :

  • GIT server
  • TIBCO ModelOps Server
  • Nexus repository configured with :
    • Maven repository, populated with TIBCO artifacts
    • Python repository (both proxy and hosted)
    • Container registry
    • Helm chart repository
  • TIBCO Data Channel Registry
  • TIBCO Scheduling Server

Finally the installation deploys a helm chart used to later deploy a ModelOps server.

Kubernetes rollout is paused during the installation process and resumed once new container images are available.

Individual pipeline tasks are scheduled by dependency and available resources.

Old pipeline runs left over from earlier upgrades are cleaned up so that the logs for the last 3 installations only are kept.

installation pipeline

Cloud platform differences

Kubernetes features differs between platforms and so the installation process also varies slightly. In general, natively provided features are used in preference to custom provided features. These difference are shown below :

Feature Docker for desktop Kind OpenShift AKS EKS Nutanix
Operator Lifecycle Manager Installed Installed Provided Installed TBD TBD
Container registry Nexus Kind ImageStream ACR TBD TBD
Network exposure node port node port route load balancer TBD TBD
RBAC supported No Yes Yes Yes TBD TBD
SCC supported No No Yes No TBD TBD
Windows images supported No No No Yes TBD TBD

These differences are controlled via ModelOps helm chart values parameters - these can be viewed with the helm show values modelops-1.0.0.tgz command, for example :

$ helm show values modelops-1.0.0.tgz
#
# Default values for the chart
#

#
# declare as global so subcharts get the same
#
global:

#
# empty map for sub charts to populate to extend this chart
#
  buildsteps: {}
  supportedmodels: {}
  runafter: {}

#
# cloud environment
#
  cloud: docker-for-desktop

#
# Container timezone
#
  timeZone: Europe/London

#
# nexus specific settings
#
  nexus:
    nodePort:           30020
    containerNodePort:  30030
    adminPassword:      "admin123"
    hostname:           "nexus"

#
# The following values are defaulted depending on cloud type :
#
# installOLM - install the operator lifecycle manager
#
# containerRegistry - base URI of container registry.  Use the supplied one
#   if available.
#
# containerUsername/containerPassword - if set, used to access container registry
#
# networkExposure - mechanism to use to expose network
#
# createPVC - if true create persistent volume claim in helm chart, if false
#   the persistent volume claim must be created before installing the chart.
#
# selfSignedRegistry - if true then skip tls verification on registry
#
# httpRegistry - if true then use http registry
#
# adminRBAC - if true, create and use admin service account for admin tasks
#
# kubernetesRBAC - if true, create role for streaming pod to manage service (plain kubernetes)
#
# openshiftRBAC - if true, create role for streaming pod to manage service (openshift)
#
# openshiftOperator - if true, use the OpenShift operator hub
#
# windows - if true build windows container (currently statistica scoring server)
#
# dnsSuffix - AKS only, set azure annotation for pubic dns name, ie <container>-<dnsSuffix>.<region>.cloudapp.azure.com
#

  docker-for-desktop:
    installOLM:         true
    installMetrics:     true
    installLogs:        true
    containerRegistry:  "localhost:5000"
    networkExposure:    "nodePort"
    createPVC:          true
    httpRegistry:       true
    selfSignedRegistry: false
    adminRBAC:          false
    kubernetesRBAC:     false
    openshiftRBAC:      false
    openshiftOperator:  false
    windows:            false

  kind:
    installOLM:         true
    installMetrics:     true
    installLogs:        true
    containerRegistry:  "kind-registry:5000"
    networkExposure:    "nodePort"
    createPVC:          true
    selfSignedRegistry: false
    httpRegistry:       true
    adminRBAC:          true
    kubernetesRBAC:     true
    openshiftRBAC:      false
    openshiftOperator:  false
    windows:            false

  openshift:
    installOLM:         false
    installMetrics:     true
    installLogs:        true
    containerRegistry:  "image-registry.openshift-image-registry.svc:5000/default"
    networkExposure:    "route"
    createPVC:          true
    selfSignedRegistry: true
    httpRegistry:       false
    adminRBAC:          true
    kubernetesRBAC:     false
    openshiftRBAC:      true
    openshiftOperator:  true
    windows:            false

  aks:
    installOLM:         true
    installMetrics:     true
    installLogs:        true
    containerRegistry:  "myregistry.azurecr.io"
    containerUsername:  "azure appid"
    containerPassword:  "azure password"
    azureTenantId:      "azure tenantId"
    networkExposure:    "loadBalancer"
    domain:             "tobeset"
    createPVC:          false
    selfSignedRegistry: false
    httpRegistry:       true
    adminRBAC:          true
    kubernetesRBAC:     true
    openshiftRBAC:      false
    openshiftOperator:  false
    windows:            true

#
# sizing details
#
  small:
    nexus:
      disk: "20Gi"
      memory: "1.5Gi"
    git:
      disk: "5Gi"
    modelopsserver:
      disk: "5Gi"
    modelopsmetrics:
      memory: "10Gi"
      interval: "30"
    elasticsearch:
      disk: "5Gi"
      memory: "2Gi"
    prometheus:
      interval: "30s"

  medium:
    nexus:
      disk: "20Gi"
      memory: "1.5Gi"
    git:
      disk: "20Gi"
    modelopsserver:
      disk: "20Gi"
    modelopsmetrics:
      memory: "15Gi"
      interval: "10"
    elasticsearch:
      disk: "20Gi"
      memory: "5Gi"
    prometheus:
      interval: "10s"

  large:
    nexus:
      disk: "20Gi"
      memory: "1.5Gi"
    git:
      disk: "100Gi"
    modelops-erver:
      disk: "100Gi"
    modelopsmetrics:
      memory: "20Gi"
      interval: "10"
    elasticsearch:
      disk: "100Gi"
      memory: "10Gi"
    prometheus:
      interval: "10s"
  
#
# hence the chart may be installed :
#
#   helm install modelops target/helm/repo/modelops-1.0.0.tgz --set cloud=openshift
#
# or override individual settings
#
#   helm install modelops target/helm/repo/modelops-1.0.0.tgz --set cloud=openshift --set openshift.createPVC=true
#

#
# auto start deployments ( after image is built )
#
autostartdeployments:
  tools:              false
  statistica:         false
  pmml:               false
  tensorflow:         false
  scoring-flow:       false
  data-channel:       true
  scheduling-server:  true
  modelops-server:    true

#
#
# git specific settings
#
# if azureDiskURL is set, use azureDisk with that URL
#
git:
  nodePort:           30010
  username:           "modelops"
  password:           "modelops"
  repository:         "scoringpipelines"
  azureDiskURL:       ""

#
# modelops-server specific settings
#
# if azureDiskURL is set, use azureDisk with that URL
#
modelopsserver:
  nodePort:           30040
  username:           "admin"
  password:           "admin"
  azureDiskURL:       ""

#
# prometheus specific settings
#
prometheus:
  nodePort:           30050

#
# grafana specific settings
#
grafana:
  nodePort:           30060

#
# elasiticsearch specific settings
#
elasticsearch:
  nodePort:           30070
  password:           "elastic"

#
# kibana specific settings
#
kibana:
  nodePort:           30080

#
# scheduling-server specific settings
#
schedulingserver:
  nodePort:           30090
  logLevel:           "INFO"

#
# data channel specific settings
#
#
datachannel:
  nodePort:           30100

So to choose the defaults for a given environment, just set global.cloud to the right environment :

$ helm install modelops modelops-1.0.0.tgz --set global.cloud=kind

However individual settings can be overridden if required, using cloud name.parameter format. For example :

$ helm install modelops modelops-1.0.0.tgz --set global.cloud=docker-for-desktop \
    --set global.docker-for-desktop.containerRegistry=myserver:30030

Some examples are shown in the sections below :

FIX THIS - I wondered if we need to be more specific about creating these environments ?

Docker for Desktop

To install docker for desktop :

  1. Follow the instructions at https://www.docker.com/products/docker-desktop
  2. Enable Kubernetes in the GUI
  3. Ensure there are sufficient resources allocated

ModelOps is installed using these commands:

//
//  Install ModelOps helm charts
//
$ cd ${TIBCO_EP_HOME}/ep-modelops/kubernetes-installer/target/helm/repo
$ helm install modelops \
    modelops-1.0.0.tgz \
    --set global.cloud=docker-for-desktop \
    --set global.docker-for-desktop.containerRegistry=$(hostname -s):30030
//
//  Populate the Maven repository inside the K8s cluster
//
$ cd ${TIBCO_EP_HOME}/ep-modelops/kubernetes-installer/target/
$ kubectl cp kubernetes-installer-1.0.0-mavenrepo.zip \
    mavenrepo-0:/tmp/mavenrepo.zip

Kind

To install kind :

  1. Install docker for desktop as above
  2. Follow the instructions at https://kind.sigs.k8s.io/docs/user/quick-start/
  3. Follow the instructions at https://kind.sigs.k8s.io/docs/user/local-registry/ to configure the registry

Once installed, a typical ModelOps installation command for kind is :

$ helm install modelops modelops-1.0.0.tgz --set global.cloud=kind
$ kubectl cp kubernetes-installer-1.0.0-mavenrepo.zip mavenrepo-0:/tmp/mavenrepo.zip

OpenShift Code Ready Containers (CRC)

To install code ready containers :

  1. Follow the instructions at https://cloud.redhat.com/openshift/install/crc/installer-provisioned
  2. Start with crc start –cpus 6 –memory 16384 –pull-secret-file ~/pull-secret.txt

Then a typical ModelOps installation command for OpenShift code ready containers is :

$ helm install modelops modelops-1.0.0.tgz --set global.cloud=openshift
$ kubectl cp kubernetes-installer-1.0.0-mavenrepo.zip mavenrepo-0:/tmp/mavenrepo.zip

AKS

To install AKS :

  1. Follow the instructions at https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough
  2. To support windows containers, follow the instructions at https://docs.microsoft.com/en-us/azure/aks/windows-container-cli#add-a-windows-server-node-pool

An example AKS script is :

namespace=modelops
group=modelops
cluster=modelops
acr=registry
azure_winpassword="set this"
azure_appId="set this"
azure_password="set this""
azure_tenantId="set this"

#
# create cluster
#
az group create --name ${group} --location uksouth
az acr create --name ${acr} --resource-group ${group} --sku basic
az aks create \
    --resource-group ${group} \
    --service-principal ${azure_appId} \
    --client-secret ${azure_password} \
    --name ${cluster} \
    --node-count 1 \
    --enable-cluster-autoscaler \
    --min-count 1 \
    --max-count 5 \
    --no-ssh-key \
    --windows-admin-password ${azure_winpassword} \
    --windows-admin-username azureuser \
    --vm-set-type VirtualMachineScaleSets \
    --node-vm-size Standard_B8ms \
    --network-plugin azure \
    --attach-acr ${acr}

#
# add windows pool and set taint to avoid default use
# (some operators will attempt to schedule linux pod on windows node)
#
az aks nodepool add \
    --resource-group ${group} \
    --cluster-name ${cluster} \
    --os-type Windows \
    --name npwin \
    --node-count 1 \
    --enable-cluster-autoscaler \
    --min-count 1 \
    --max-count 2 \
    --node-vm-size Standard_B8ms \
    --node-taints os=windows:NoSchedule

#
# create any persistent volumes ( external to kubernetes / namespace )
#
nodegroup=$(az aks show --resource-group ${group} --name ${cluster} --query nodeResourceGroup -o tsv)
az disk create \
  --resource-group ${nodegroup} \
  --name modelops-server \
  --size-gb 5 \
  --query id --output tsv
az disk create \
  --resource-group ${nodegroup} \
  --name git-server \
  --size-gb 5 \
  --query id --output tsv

Once installed, a typical ModelOps installation command for AKS is :

$ helm install modelops modelops-1.0.0.tgz --atomic --set global.cloud=aks \
    --set global.aks.containerRegistry=${acr}.azurecr.io \
    --set global.aks.containerUsername=${azure_appId} \
    --set global.aks.containerPassword=${azure_password} \
    --set global.aks.azureTenantId=${azure_tenantId}
$ kubectl cp kubernetes-installer-1.0.0-mavenrepo.zip mavenrepo-0:/tmp/mavenrepo.zip

Note that the chosen Azure registry URL and authentication details must be provided.

Upgrading

To upgrade the ModelOps components use :

$ helm upgrade modelops modelops-1.0.0.tgz ...

However, its common practice to use the same command for installation and upgrades :

$ helm upgrade modelops modelops-1.0.0.tgz --install ...

When the installation is upgraded the installation pipeline is re-executed and a rollout restart is performed on existing pods.

Uninstalling

To uninstall the ModelOps components use:

$ helm uninstall modelops

Note that this doesn’t uninstall the Kubernetes operators (so that a further install is faster).

To uninstall everything to start from scratch reset the Kubernetes cluster, for example in Docker for Desktop:

Reset Kubernetes Cluster

Troubleshooting

Always ensure the kubernetes context is what you expect. For example with docker for desktop :

$ kubectl config current-context
docker-desktop

The context is also displayed in docker for desktop UI.