Cloud Installation

Introduction

This section provides general information on deploying ModelOps to a Kubernetes cluster.

Requirements
Sizing
Passwords and secrets
Quick run through
Alternative ways to manage volumes
The installation pipeline
Cloud platform differences
Upgrading
Uninstalling
Troubleshooting

Requirements

The following tools are required to complete the installation - these must be downloaded and installed prior to installing ModelOps :

Kubernetes CLI tool
- macOS: brew install kubectl
Helm CLI tool
- macOS: brew install helm
Tekton CLI tool
- macOS: brew install tektoncd-cli

Additional requirements depend on the Kubernetes cloud platform being used:

Optional Tools

These tools are optional, but have been found to be useful

Lens
- macOS: brew install lens

Azure AKS

Azure AKS also requires:

Azure CLI Tools must be installed and configured.
- macOS: brew install azure-cli

OpenShift

OpenShift also requires:

OpenShift CLI tools must be installed and configured.
- macOS: brew install openshift-cli

Sizing

Modelops can be quickly configured for a small, medium or large installation whilst also allowing for further customizations as needed.

Whilst many values are best kept at defaults, the values listed below have biggest effect on sizing and so are exposed as install options.

Small - for single scoring

Minimum hardware is single virtual machine, 6 cores, 30GB memory and 100G disk space. Large configurations will allow for more concurrent scoring and faster executions.

Linux based scoring only ( a hybrid deployment of linux and windows nodes is required for statistica scoring ).

The helm install option, –set size=small, defaults to :

Feature	Value	Override name	Units
Scoring flow memory limit	1.5Gi	small.scoringflow.memory	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Scoring flow cpu limit	2	small.scoringflow.cpu	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus memory limit	2Gi	small.nexus.memory	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus disk space	20Gi	small.nexus.disk	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Git server disk space	5Gi	small.git.disk	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server memory limit	1Gi	small.modelopsserver.memory	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server cpu limit	2	small.modelopsserver.cpu	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server disk space	5Gi	small.modelopsserver.disk	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics memory limit	10Gi	small.modelopsmetrics.memory	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics sampling interval	30	small.modelopsmetrics.interval	Seconds
Modelops metrics table size	50	small.modelopsmetrics.tablesize	Megabytes
Modelops metrics data age	5	small.modelopsmetrics.age	Minutes
Elastic search disk space	10Gi	small.elasticsearch.disk	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Elastic search memory limit	2Gi	small.elasticsearch.memory	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Prometheus sampling interval	30s	small.prometheus.intervalDuration	Seconds
Prometheus disk space	10Gi	small.prometheus.diskGi	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Prometheus retention size	8GB	small.prometheus.storage.tsdb.retention.size	See https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects

Medium - for small teams, the installation default

Minimum hardware is single virtual machine, 8 cpus, 32GB memory and 100G disk space. Recommend cloud infrastructure configured for cluster scaling to absorb variable demand. A useful cluster scaling configuration is a minimum of 1 virtual machine and maximum of 5 virtual machines, although experience shows 2 servers usually sufficient.

Additional windows virtual machines can be added to the cluster to support statistica scoring if required.

The helm install option, –set size=medium, defaults to :

Feature	Value	Override name	Units
Scoring flow memory limit	2Gi	medium.scoringflow.memory	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Scoring flow cpu limit	4	medium.scoringflow.cpu	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus memory limit	2Gi	medium.nexus.memory	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus disk space	20Gi	medium.nexus.disk	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Git server disk space	20Gi	medium.git.disk	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server memory limit	2Gi	medium.modelopsserver.memory	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server cpu limit	4	medium.modelopsserver.cpu	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server disk space	20Gi	medium.modelopsserver.disk	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics memory limit	15Gi	medium.modelopsmetrics.memory	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics sampling interval	10	medium.modelopsmetrics.interval	Seconds
Modelops metrics table size	50	medium.modelopsmetrics.tablesize	Megabytes
Modelops metrics data age	5	medium.modelopsmetrics.age	Minutes
Elastic search disk space	50Gi	medium.elasticsearch.disk	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Elastic search memory limit	5Gi	medium.elasticsearch.memory	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Prometheus sampling interval	10s	medium.prometheus.intervalDuration	Seconds
Prometheus disk space	50Gi	medium.prometheus.diskGi	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Prometheus retention size	48GB	medium.prometheus.storage.tsdb.retention.size	See https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects

Large - for larger teams

Minimum hardware is single virtual machine, 16 cpus, 64GB memory and 500G disk space. Recommend cloud infrastructure configured for cluster scaling to absorb variable demand. A useful cluster scaling configuration is a minimum of 1 virtual machine and maximum of 10 virtual machines.

Additional windows virtual machines can be added to the cluster to support statistica scoring if required.

The helm install option, –set size=large, defaults to :

Feature	Value	Override name	Units
Scoring flow memory limit	5Gi	large.scoringflow.memory	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Scoring flow cpu limit	6	large.scoringflow.cpu	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus memory limit	2Gi	large.nexus.memory	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus disk space	20Gi	large.nexus.disk	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Git server disk space	100Gi	large.git.disk	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server memory limit	4Gi	large.modelopsserver.memory	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server cpu limit	6	large.modelopsserver.cpu	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server disk space	100Gi	large.modelopsserver.disk	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics memory limit	20Gi	large.modelopsmetrics.memory	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics sampling interval	10	large.modelopsmetrics.interval	Seconds
Modelops metrics table size	50	large.modelopsmetrics.tablesize	Megabytes
Modelops metrics data age	5	large.modelopsmetrics.age	Minutes
Elastic search disk space	100Gi	large.elasticsearch.disk	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Elastic search memory limit	10Gi	large.elasticsearch.memory	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Prometheus sampling interval	10s	large.prometheus.intervalDuration	Seconds
Prometheus disk space	100Gi	large.prometheus.diskGi	See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Prometheus retention size	98GB	large.prometheus.storage.tsdb.retention.size	See https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects

Further sizing customizations

Each of the above values can be overridden as needed using the override name included above, for example to increase the git server disk space with the medium configuration, use –set size=medium –set medium.git.disk=50Gi.

Passwords and secrets

In order to avoid clear text passwords, Kubenertes provides a Secrets facility. So prior to installation, kubernetes Secrets have to be created to contain the passwords required by modelops.

These are :

Description	Secret name	Key name	Comments
Elastic search	elasticsearch-es-elastic-user	Elastic search user name	See https://www.elastic.co/guide/en/cloud-on-k8s/master/k8s-users-and-roles.html - if not set elastic search generates a password
Git server	git-server	git user name
Nexus server	nexus-server	admin
Modelops server	modelops-server	admin
Scoring flow admin	scoring-admin	admin

These secrets may be created via the cloud infrastructure or on the command-line using kubectl. For example :


    # elastic search
    #
    # note in this case we use apply to avoid elastic search re-creating the secret
    #
    kubectl create secret generic elasticsearch-es-elastic-user --from-literal=elastic=mysecretpassword --namespace modelops --dry-run=client --output=yaml 2>/dev/null > secret.yaml
    kubectl apply --filename secret.yaml
    # git server
    #
    kubectl create secret generic git-server --from-literal=modelops=mysecretpassword --namespace modelops
    # nexus server
    #
    kubectl create secret generic nexus-server --from-literal=admin=mysecretpassword --namespace modelops
    # modelops server
    #
    kubectl create secret generic modelops-server --from-literal=admin=mysecretpassword --namespace modelops
    # scoring admin
    #
    kubectl create secret generic scoring-admin --from-literal=admin=mysecretpassword --namespace modelops

NOTE: The Elasticsearch password is limited to alphanumeric, “.”, “_”, “~”, and “-” characters, i.e. it must conform to this regular expression (‘^[a-zA-Z0-9._~-]+$’)).

It is recommended to install an encryption provider for maximum security - see https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/.

Quick run through

The Helm CLI tool is used to install the ModelOps components to Kubernetes :


    $ helm upgrade --install installer helm-charts/kubernetes-installer-1.1.0.tgz --atomic --set cloud=aks

This command first installs and starts the bootstrap pipeline which installes the required Kubernetes operators - this takes a few seconds after which the helm command returns with a summary of the installation.

For example :


    Release "installer" does not exist. Installing it now.
    NAME: installer
    LAST DEPLOYED: Mon Jan 24 14:04:38 2022
    NAMESPACE: modelops
    STATUS: deployed
    REVISION: 1
    TEST SUITE: None
    NOTES:
    Thank you for installing ep-kubernetes-installer configured for docker-for-desktop in kubernetes v1.22.5
    The Operator Lifecycle Manager has been installed
    The bootstrap pipeline has started which includes :
      Adding kubernetes permissions
      Installing a nexus server 
      Installing ElasticSearch and Kibana
      Installing Prometheus
      Populating the nexus repository with artifacts
      Creating a product install pipeline from helm charts
      Starting the nexus server at :
        Internal web console URL - http://artifact-repository:80/
          Maven repository - http://artifact-repository:80/repository/maven-public/
          Helm repository - http://artifact-repository:80/repository/helm/
          PyPi proxy - http://artifact-repository:80/repository/pypi-group
          Container registry - 192.168.175.10:8082
      Starting prometheus server at :
        Internal URL - http://prometheus.modelops:9090
      Starting elasticsearch server at :
        Internal URL - http://elasticsearch-es-http:9200
        Userid is elastic, password set in kubernetes secret
      Starting kibana server at :
        Internal URL - http://kibana-kb-http
        Userid is elastic, password set in kubernetes secret
        The docker daemon should be configured to allow http pull requests :
        {
          "insecure-registries": [
            "192.168.175.10:8082"
          ]
        }
    To track the progress of the bootstrap pipeline run :
      tkn pipelinerun logs bootstrap --follow --namespace modelops

The output depends on the cloud platform and any additional options selected. These details are also displayed with the helm status modelops command.

The zip of maven artifacts should be copied using kubectl cp command :


    $ kubectl cp modelops-repo-1.3.0-mavenrepo.zip mavenrepo-0:/tmp/ --namespace modelops

At this point the installation has been started and, as mentioned above, the status of the installation can be monitored with tkn pipelinerun logs bootstrap -f. For example :


    $ tkn pipelinerun logs bootstrap --follow --namespace modelops
    [nexus : nexus] Installing nexus operator
    [nexus : nexus] namespace/nexus-operator-system created
    [nexus : nexus] customresourcedefinition.apiextensions.k8s.io/nexus.apps.m88i.io created
    [nexus : nexus] role.rbac.authorization.k8s.io/nexus-operator-leader-election-role created
    [nexus : nexus] clusterrole.rbac.authorization.k8s.io/nexus-operator-manager-role created
    [nexus : nexus] clusterrole.rbac.authorization.k8s.io/nexus-operator-metrics-reader created
    [nexus : nexus] clusterrole.rbac.authorization.k8s.io/nexus-operator-proxy-role created
    [nexus : nexus] rolebinding.rbac.authorization.k8s.io/nexus-operator-leader-election-rolebinding created
    [nexus : nexus] clusterrolebinding.rbac.authorization.k8s.io/nexus-operator-manager-rolebinding created
    [nexus : nexus] clusterrolebinding.rbac.authorization.k8s.io/nexus-operator-proxy-rolebinding created
    [nexus : nexus] service/nexus-operator-controller-manager-metrics-service created
    ....
    [install-pipeline-run : run] 14:16:27.765 [main] INFO com.tibco.streaming.installpipeline.Kubernetes - To track the progress of the modelops-server pipeline run :
    [install-pipeline-run : run] 14:16:27.766 [main] INFO com.tibco.streaming.installpipeline.Kubernetes -   tkn pipelinerun logs modelops-server --follow --namespace modelops

The installation process can run tasks in parallel - hence the output is prefixed with the task and lines are coloured.

Once the bootstrap pipeline has completed, the application pipeline can be monitored in a similar way :


    $ tkn pipelinerun logs modelops-server --follow --namespace modelops
    ....
    [scheduling-server-scale : scale] Resuming rollout of scheduling-server
    [scheduling-server-scale : scale] deployment.apps/scheduling-server resumed
    [data-channel-registry-prepare : prepare] Preparing directory for data-channel-registry
    ....

The installation is completed when the tkn pipelinerun logs modelops-server –follow –namespace modelops command completes. The tkn taskrun list command shows the task status :


    $ tkn taskrun list --namespace modelops
    NAME                                                        STARTED          DURATION     STATUS
    modelops-server-modelops-server-scale                       1 minute ago     20 seconds   Succeeded
    modelops-server-modelops-metrics-scale                      3 minutes ago    11 seconds   Succeeded
    modelops-server-kafka-datasource-image                      4 minutes ago    2 minutes    Succeeded
    modelops-server-kafka-datasink-image                        4 minutes ago    2 minutes    Succeeded
    modelops-server-modelops-server-image                       4 minutes ago    3 minutes    Succeeded
    modelops-server-scoring-flow-image                          5 minutes ago    5 minutes    Succeeded
    modelops-server-test-datasource-image                       5 minutes ago    1 minute     Succeeded
    modelops-server-test-datasink-image                         5 minutes ago    2 minutes    Succeeded
    modelops-server-modelops-metrics-image                      6 minutes ago    2 minutes    Succeeded
    modelops-server-modelops-metrics-maven                      7 minutes ago    1 minute     Succeeded
    modelops-server-kafka-datasource-maven                      8 minutes ago    3 minutes    Succeeded
    modelops-server-test-datasource-maven                       8 minutes ago    2 minutes    Succeeded
    modelops-server-scoring-flow-maven                          8 minutes ago    2 minutes    Succeeded
    modelops-server-test-datasink-maven                         8 minutes ago    2 minutes    Succeeded
    modelops-server-kafka-datasink-maven                        8 minutes ago    3 minutes    Succeeded
    modelops-server-modelops-server-maven                       8 minutes ago    3 minutes    Succeeded
    modelops-server-scoring-flow-prepare                        8 minutes ago    43 seconds   Succeeded
    modelops-server-kafka-datasource-prepare                    8 minutes ago    51 seconds   Succeeded
    modelops-server-test-datasource-prepare                     8 minutes ago    44 seconds   Succeeded
    modelops-server-kafka-datasink-prepare                      9 minutes ago    42 seconds   Succeeded
    modelops-server-test-datasink-prepare                       9 minutes ago    42 seconds   Succeeded
    modelops-server-modelops-metrics-prepare                    9 minutes ago    1 minute     Succeeded
    modelops-server-modelops-server-prepare                     9 minutes ago    42 seconds   Succeeded
    modelops-server-scheduling-server-scale                     10 minutes ago   39 seconds   Succeeded
    modelops-server-data-channel-registry-scale                 10 minutes ago   19 seconds   Succeeded
    modelops-server-sbrt-base-image                             20 minutes ago   11 minutes   Succeeded
    modelops-server-statistica-image                            21 minutes ago   3 minutes    Succeeded
    modelops-server-tensorflow-image                            21 minutes ago   14 minutes   Succeeded
    modelops-server-rest-datasink-image                         21 minutes ago   4 minutes    Succeeded
    modelops-server-spark-image                                 21 minutes ago   14 minutes   Succeeded
    modelops-server-rest-datasource-image                       21 minutes ago   4 minutes    Succeeded
    modelops-server-python-image                                21 minutes ago   15 minutes   Succeeded
    modelops-server-file-datasource-image                       21 minutes ago   5 minutes    Succeeded
    modelops-server-file-datasink-image                         21 minutes ago   3 minutes    Succeeded
    modelops-server-rest-request-response-datachannel-image     21 minutes ago   4 minutes    Succeeded
    modelops-server-pmml-image                                  21 minutes ago   10 minutes   Succeeded
    modelops-server-scheduling-server-image                     21 minutes ago   11 minutes   Succeeded
    modelops-server-jdbc-datasource-image                       21 minutes ago   2 minutes    Succeeded
    modelops-server-data-channel-registry-image                 21 minutes ago   10 minutes   Succeeded
    modelops-server-git-server-scale                            23 minutes ago   15 seconds   Succeeded
    modelops-server-sbrt-base-maven                             24 minutes ago   3 minutes    Succeeded
    modelops-server-rest-datasink-maven                         24 minutes ago   3 minutes    Succeeded
    modelops-server-file-datasink-maven                         24 minutes ago   3 minutes    Succeeded
    modelops-server-data-channel-registry-maven                 24 minutes ago   3 minutes    Succeeded
    modelops-server-pmml-maven                                  25 minutes ago   3 minutes    Succeeded
    modelops-server-rest-datasource-maven                       25 minutes ago   4 minutes    Succeeded
    modelops-server-spark-maven                                 25 minutes ago   4 minutes    Succeeded
    modelops-server-scheduling-server-maven                     25 minutes ago   4 minutes    Succeeded
    modelops-server-git-server-image                            25 minutes ago   2 minutes    Succeeded
    modelops-server-file-datasource-maven                       25 minutes ago   4 minutes    Succeeded
    modelops-server-tensorflow-maven                            25 minutes ago   4 minutes    Succeeded
    modelops-server-python-maven                                25 minutes ago   4 minutes    Succeeded
    modelops-server-rest-request-response-datachannel-maven     25 minutes ago   4 minutes    Succeeded
    modelops-server-jdbc-datasource-maven                       25 minutes ago   4 minutes    Succeeded
    modelops-server-statistica-maven                            25 minutes ago   4 minutes    Succeeded
    modelops-server-rest-datasink-prepare                       26 minutes ago   1 minute     Succeeded
    modelops-server-git-server-prepare                          26 minutes ago   40 seconds   Succeeded
    modelops-server-spark-prepare                               26 minutes ago   46 seconds   Succeeded
    modelops-server-sbrt-base-prepare                           26 minutes ago   1 minute     Succeeded
    modelops-server-tensorflow-prepare                          26 minutes ago   38 seconds   Succeeded
    modelops-server-data-channel-registry-prepare               26 minutes ago   1 minute     Succeeded
    modelops-server-file-datasink-prepare                       26 minutes ago   1 minute     Succeeded
    modelops-server-file-datasource-prepare                     26 minutes ago   37 seconds   Succeeded
    modelops-server-rest-request-response-datachannel-prepare   26 minutes ago   39 seconds   Succeeded
    modelops-server-statistica-prepare                          26 minutes ago   35 seconds   Succeeded
    modelops-server-scheduling-server-prepare                   26 minutes ago   48 seconds   Succeeded
    modelops-server-pmml-prepare                                26 minutes ago   1 minute     Succeeded
    modelops-server-rest-datasource-prepare                     26 minutes ago   52 seconds   Succeeded
    modelops-server-python-prepare                              26 minutes ago   39 seconds   Succeeded
    modelops-server-jdbc-datasource-prepare                     26 minutes ago   36 seconds   Succeeded
    bootstrap-install-pipeline-run                              27 minutes ago   23 seconds   Succeeded
    bootstrap-install-pipeline-image                            27 minutes ago   55 seconds   Succeeded
    bootstrap-install-pipeline-maven                            29 minutes ago   1 minute     Succeeded
    bootstrap-nexus-helm-index                                  29 minutes ago   20 seconds   Succeeded
    bootstrap-install-pipeline-prepare                          29 minutes ago   20 seconds   Succeeded
    bootstrap-ingress                                           31 minutes ago   1 minute     Succeeded
    bootstrap-elasticsearch                                     31 minutes ago   2 minutes    Succeeded
    bootstrap-deploy-artifacts                                  31 minutes ago   2 minutes    Succeeded
    bootstrap-prometheus                                        31 minutes ago   1 minute     Succeeded
    bootstrap-tools-image                                       33 minutes ago   1 minute     Succeeded
    bootstrap-tools-prepare                                     33 minutes ago   7 seconds    Succeeded
    bootstrap-nexus-repositories                                37 minutes ago   4 minutes    Succeeded
    bootstrap-nexus                                             37 minutes ago   15 seconds   Succeeded
    bootstrap-tidy-up                                           37 minutes ago   10 seconds   Succeeded

Alternative ways to manage volumes

By default, the modelops installation will create kubernetes persistent volume claims for the modelops and git servers. However, if needed, these volumes can be managed differently.

Pre-create persistent volume claims

In this case the administrator pre-creates git-server and modelops-server persistent volume claims ( perhaps with custom storage class and reclaim policy ) and specifies the createPVC=false option when installing modelops :


    $ helm upgrade --install installer helm-charts/kubernetes-installer-1.1.0.tgz --atomic --set cloud=aks --set createPVC=false

Use a custom storage class

In this case the adminstrator pre-creates a custom storage class (or specify a non-default one) and specifies the modelopsserver.storageClass and/or modelopsserver.storageClass options when installing modelops :


    $ helm upgrade --install installer helm-charts/kubernetes-installer-1.1.0.tgz --atomic --set cloud=aks --set modelopsserver.storageClass=customclass

The installation pipeline

The install process is controlled via a Tekton pipeline called installation. This pipeline first installs the following Kubernetes Operators during the pre-install hook :

Operator Lifecycle Manager (if required)
Tekton pipeline operator
Tekton triggers operator
Nexus 3 operator
Prometheus operator
Elastic Cloud on Kubernetes operator
General purpose tools image - used for various build and deploy tasks

Kubernetes permissions are added to support Role-based access control (RBAC), security context constraints (SCC) and Streaming discovery.

The following container images are built in Kubernetes :

GIT server image - used to hold the ModelOps artifacts

Modelops helm charts also create :

TIBCO Streaming runtime base image - used as a base for further images
TIBCO ModelOps Server image - scoring pipeline, flow, and model management
TIBCO ModelOps Scoring Server image for for each runner - model scoring
TIBCO Data Channel Registry - data source and sink registration and discovery
TIBCO ModelOps Scheduling Server - job scheduling
Various data channels

The following services are started :

GIT server
TIBCO ModelOps Server
Nexus repository configured with :
- Maven repository, populated with TIBCO artifacts
- Python repository (both proxy and hosted)
- Container registry
- Helm chart repository
TIBCO Data Channel Registry
TIBCO Scheduling Server

Finally the installation deploys a helm chart used to later deploy a ModelOps server.

Kubernetes rollout is paused during the installation process and resumed once new container images are available.

Individual pipeline tasks are scheduled by dependency and available resources.

installation pipeline

Cloud platform differences

Kubernetes features differs between platforms and so the installation process also varies slightly. In general, natively provided features are used in preference to custom provided features. These difference are shown below :

Feature	OpenShift	AKS	EKS
Operator Lifecycle Manager	Provided	Installed	Installed
Container registry	ImageStream	ACR	ECR
Network exposure	route	Ingress	Ingress
RBAC supported	Yes	Yes	Yes
SCC supported	Yes	No	No
Windows images supported	No	Yes	No

These differences are controlled via ModelOps helm chart values parameters - these can be viewed with the helm show values kubernetes-installer-1.1.0.tgz command, for example :

$ helm show values kubernetes-installer-1.1.0.tgz
#
# Default values for the chart
#

#
# cloud environment
#
cloud: docker-for-desktop

#
# image pull policy
#
pullpolicy:           "IfNotPresent"

#
# sizing
#
size: medium

#
# operator lifecycle manager specific settings
#
olm:
  operatorVersion:    "v0.17.0"

#
# tekton specific settings
#
tekton:
  operatorVersion:    "latest"

#
# nexus specific settings
#
nexus:
  operatorVersion:    "v0.6.0"
  internalPort:       80
  nodePort:           30020
  containerNodePort:  30030
  hostname:           "artifact-repository"
  maven:
    maven-proxy:
      url:            "https://repo1.maven.org/maven2/"
  pypi:
    pypi-proxy:
      url:            "https://pypi.org/"
  yum:
    yum-proxy:
      url:            "https://repo.almalinux.org/almalinux"
 
#
# The following values are defaulted depending on cloud type :
#
# installOLM - install the operator lifecycle manager
#
# containerRegistry - base URI of container registry.  Use the supplied one 
#   if available.
#
# containerUsername/containerPassword - if set, used to access container registry
#
# networkExposure - mechanism to use to expose network
#
# createPVC - if true create persistent volume claim in helm chart, if false 
#   the persistent volume claim must be created before installing the chart.
#
# selfSignedRegistry - if true then skip tls verification on registry
#
# httpRegistry - if true then use http registry
#
# roleBasedAccessControl - kubernetes or openshift
#
# windows - if true build windows container (currently statistica scoring server)
#
# dnsSuffix - AKS only, set azure annotation for pubic dns name, ie <container>-<dnsSuffix>.<region>.cloudapp.azure.com
#

docker-for-desktop:
  installOLM:         true
  installMetrics:     true
  installLogs:        true
  containerRegistry:  "localhost:5000"
  networkExposure:    "nodePort"
  createPVC:          true
  httpRegistry:       true
  selfSignedRegistry: false
  roleBasedAccessControl: "kubernetes"
  windows:            false
  ingressDomain:      "tobeset"

kind:
  installOLM:         true
  installMetrics:     true
  installLogs:        true
  containerRegistry:  "kind-registry:5000"
  networkExposure:    "ingress"
  createPVC:          true
  selfSignedRegistry: false
  httpRegistry:       true
  roleBasedAccessControl: "kubernetes"
  windows:            false
  ingressDomain:      "tobeset"

colima:
  installOLM:         true
  installMetrics:     true
  installLogs:        true
  containerRegistry:  "localhost:5000"
  networkExposure:    "nodePort"
  createPVC:          true
  httpRegistry:       true
  selfSignedRegistry: false
  roleBasedAccessControl: "kubernetes"
  windows:            false
  ingressDomain:      "tobeset"

openshift:
  installOLM:         false
  installMetrics:     true
  installLogs:        true
  containerRegistry:  "image-registry.openshift-image-registry.svc:5000/{{ .Release.Namespace }}"
  networkExposure:    "route"
  createPVC:          true
  selfSignedRegistry: true
  httpRegistry:       false
  roleBasedAccessControl: "openshift"
  windows:            false
  ingressDomain:      "tobeset"

aks:
  installOLM:         true
  installMetrics:     true
  installLogs:        true
  containerRegistry:  "myregistry.azurecr.io"
  containerUsername:  "azure appid"
  containerPassword:  "azure password"
  azureTenantId:      "azure tenantId"
  networkExposure:    "ingress"
  createPVC:          true
  selfSignedRegistry: false
  httpRegistry:       true
  roleBasedAccessControl: "kubernetes"
  windows:            true
  ingressDomain:      "tobeset"
  # oauth2:             "azure"

eks:
  installOLM:         true
  installMetrics:     true
  installLogs:        true
  containerRegistry:  "eks registry"
  region:             "region"
  networkExposure:    "ingress"
  createPVC:          true
  selfSignedRegistry: false
  httpRegistry:       true
  roleBasedAccessControl: "kubernetes"
  windows:            false
  ingressDomain:      "tobeset"
  # oauth2:             "cognito"

#
# sizing details
#
small:
  general:
    cpu: "2"
    memory: "400Mi"
  nexus:
    disk: "20Gi"
    memory: "2Gi"
  elasticsearch:
    disk: "10Gi"
    memory: "2Gi"
  prometheus:
    intervalDuration: "30s"
    diskGi: "10Gi"
    storage.tsdb.retention.size: "8GB"

medium:
  general:
    cpu: "2"
    memory: "400Mi"
  nexus:
    disk: "20Gi"
    memory: "2Gi"
  elasticsearch:
    disk: "50Gi"
    memory: "5Gi"
  prometheus:
    intervalDuration: "10s"
    diskGi: "50Gi"
    storage.tsdb.retention.size: "48GB"

large:
  general:
    cpu: "2"
    memory: "400Mi"
  nexus:
    disk: "20Gi"
    memory: "2Gi"
  elasticsearch:
    disk: "100Gi"
    memory: "10Gi"
  prometheus:
    intervalDuration: "10s"
    diskGi: "100Gi"
    storage.tsdb.retention.size: "98GB"

#
# hence the chart may be installed :
#
#   helm install installer kubernetes-installer-[version].tgz --set cloud=openshift
#
# or override individual settings
#
#   helm install installer kubernetes-installer-[version].tgz --set cloud=openshift --set openshift.createPVC=true
#

#
# Kubernetes DNS domain - not generally used but needed for windows work-arounds (see TMO-1156)
#
clusterName:              "svc.cluster.local"

#
# prometheus specific settings
#
# if storageClass is set, use storageClass in volumeClaimTemplate (otherwise system defult is used)
#
# See https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects for retention time
#
prometheus:
  operatorVersion:    "30.0.1"
  nodePort:           30050
  storage.tsbd.retention.time:      "1y"
  storageClass:       ""
# see https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#alertmanagerconfigspec
#  alerts:
#    route:
#      groupBy: ['job']
#      receiver: "test"
#    receivers:
#    - name: "test"
#      emailConfigs:
#      - to: plord@tibco.com
#        from: plord@tibco.com
#        smarthost: smtp-relay.gmail.com:587

#
# elasiticsearch specific settings
#
elasticsearch:
  operatorVersion:    "1.9.1"
  version:            "7.16.2"
  nodePort:           30070
  username:           "elastic"

#
# kibana specific settings
#
kibana:
  version:            "7.16.2"
  nodePort:           30080
  operatorVersion:    "1.9.1"

#
# ingress nginx specific settings
#
ingressnginx:
  version:            "4.0.1"

#
# cert manager specific settings
#
certmanager:
  version:            "v1.6.1"

#
# Oauth2
# 
oauth2:

  azure:
    # oauth2 values for azure
    #
    # need a secret "oauth2" with 
    #
    # TENANT_ID set to azure tenantid
    # CLIENT_ID set to azure application id
    # CLIENT_SECRET set to azure client secret
    #
    identityAttributeName:    "unique_name"
    roleAttributeName:        "roles"
    jwtAudience:              "${CLIENT_ID}"
    jwtIssuer:                "https://sts.windows.net/${TENANT_ID}/"
    jwksURL:                  "https://login.microsoftonline.com/common/discovery/keys"
    jwksCacheTimeoutSeconds:  "3600"
    ssoLogoutURL:             "https://login.microsoftonline.com/${TENANT_ID}/oauth2/logout?post_logout_redirect_uri=https://modelops-server.${MODELOPS_DOMAIN}/oauth2/sign_out"
    # oauth2-proxy settings - see https://oauth2-proxy.github.io/oauth2-proxy/docs/
    provider:                 "azure"
    emailclaim:               "unique_name"
    azuretenant:              "${TENANT_ID}"
    oidcissuerurl:            "https://sts.windows.net/${TENANT_ID}/"
    extrajwtissuers:          "https://login.microsoftonline.com/${TENANT_ID}/v2.0=${CLIENT_ID}"
    clientid:                 "${CLIENT_ID}"
    clientsecret:             "${CLIENT_SECRET}"
    whitelist:                "login.microsoftonline.com/${TENANT_ID}"

  cognito:
    # oauth2 values for amazon cognito
    #
    # need a secret "oauth2" with 
    #
    # REGION set to cognito region
    # POOL_ID set to cognito pool id
    # CLIENT_ID set to cognito client id
    # CLIENT_SECRET set to cognito client secret
    # DOMAIN set to cognito domain
    #
    identityAttributeName:    "email"
    roleAttributeName:        "cognito:groups"
    jwtAudience:              "${CLIENT_ID}"
    jwtIssuer:                "https://cognito-idp.${REGION}.amazonaws.com/${POOL_ID}"
    jwksURL:                  "https://cognito-idp.${REGION}.amazonaws.com/${POOL_ID}/.well-known/jwks.json"
    jwksCacheTimeoutSeconds:  "3600"
    ssoLogoutURL:             "https://${DOMAIN}.auth.${REGION}.amazoncognito.com/logout?client_id=${CLIENT_ID}&logout_uri=https://modelops-server.${MODELOPS_DOMAIN}/oauth2/sign_out"
    # oauth2-proxy settings - see https://oauth2-proxy.github.io/oauth2-proxy/docs/
    provider:                 "oidc"
    emailclaim:               "email"
    oidcissuerurl:            "https://cognito-idp.${REGION}.amazonaws.com/${POOL_ID}"
    clientid:                 "${CLIENT_ID}"
    clientsecret:             "${CLIENT_SECRET}"
    whitelist:                "tibco-modelops.auth.${REGION}.amazoncognito.com"

So to choose the defaults for a given environment, just set cloud to the right environment :


    $ helm install modelops kubernetes-installer-1.1.0.tgz --set cloud=aks

However individual settings can be overridden if required, using cloud name.parameter format. For example :


    $ helm install modelops kubernetes-installer-1.1.0.tgz --set cloud=aks \
        --set aks.containerRegistry=myserver:30030

Upgrading

To upgrade the ModelOps components use :


    $ helm upgrade installer kubernetes-installer-1.1.0.tgz ...

However, its common practice to use the same command for installation and upgrades :


    $ helm upgrade installer kubernetes-installer-1.1.0.tgz --install ...

When the installation is upgraded the installation pipeline is re-executed and a rollout restart is performed on existing pods.

Uninstalling

To uninstall the ModelOps components use:


    $ helm uninstall modelops

Note that this doesn't uninstall the Kubernetes operators (so that a further install is faster).

Troubleshooting

Always ensure the kubernetes context is what you expect. For example :


    $ kubectl config current-context
    modelops

The context is also displayed in docker for desktop UI.