AWS EKS Installation

INTRODUCTION

TIBCO® ModelOps is a cloud native model management and scoring environment supporting deployment of machine learning pipelines, models, and data source and sinks. It is installed onto the customers cloud infrastructure via a number of steps. These instructions outline how to gather your cloud subscription information, download the necessary tools, create various AWS resources and then install the helm chart.

This guide’s intention is to help a user deploy ModelOps to AWS using AWS CLI commands. You should not need to be an AWS or cloud expert to follow these instructions; the goal is to provide enough context to allow even novice cloud users to complete the installation. Where possible, additional reference information is linked for additional study, if so desired. More experienced AWS users should be able to skim through the guide without taking in all the context.

Table of Contents

Overview of the installation steps

These are the installation steps that must be performed. These steps must be executed in this order.

  1. RUN MODELOPS INSTALLER, UNPACK NECESSARY ITEMS
  2. LOGIN TO AWS CLI
  3. CREATE ECR
  4. CREATE REPOSITORY AT ECR
  5. CREATE AN EKS CLUSTER (with autoscale)
  6. ADD NODEGROUP
  7. VERIFY NODES AND SHOW NODEGROUP
  8. ASSIGN IAM POLICIES
  9. CONFIGURE KUBECTL
  10. ASSIGN TAINT TO WINDOWS NODE
  11. CREATE MODELOPS NAMESPACE
  12. INSTALL THE SECRETS
  13. INSTALL THE HELM CHART
  14. COPY MAVEN ARTIFACTS (using kubectl)
  15. MONITOR INSTALLATION PROCESS (with tkn)
  16. UPDATE DNS

PREREQUISITES

In order to accomplish the AWS command line installation, you need access to a number of resources and tools.

REQUIRED RESOURCES

  • The platform specific ModelOps installer

    The installer contains platform specific helm charts and maven repositories which are required in subsequent steps. These will be available as a result of running the ModelOps installer (step 1)

  • An active assume-role AWS account with admin access.

REQUIRED TOOLS

Installation instructions for these tools are in the prerequisites installation section.

AWS PREREQUISITES

  • AWS (Individual) user account should have admin access OR AWS (federated / shared) user account should have Assume role access with Administrative role.
  • User should be able to modify below AWS services
    • EKS
    • ECR
    • Route53
    • IAM
    • VPC
    • EC2
    • CloudFormation

INSTALLATION OF PREREQUISITES

Download and install AWS CLI tools (AWS CLI Tools)

On macOS, install via brew:

brew install awscli

On Linux, install via curl:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

On Windows,

Download and run the AWS CLI MSI installer for Windows (64-bit):
C:\> msiexec.exe /i https://awscli.amazonaws.com/AWSCLIV2.msi

Download and install EKSCTL CLI tools (EKSCTL CLI Tools)

On macOS, install via brew:

brew install weaveworks/tap/eksctl   

On Linux, install via curl:

Download and extract the latest release of eksctl with the following command.
curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp

Move the extracted binary to /usr/local/bin.
sudo mv /tmp/eksctl /usr/local/bin

Test that your installation was successful with the following command.
eksctl version

On Windows, install via choco:

choco install -y eksctl 

Test that your installation was successful with the following command.
eksctl version    

Download and install Helm (Helm CLI tool)

On macOS, install via brew:

brew install helm

On Linux, install via curl:

curl https://baltocdn.com/helm/signing.asc | sudo apt-key add -
sudo apt-get install apt-transport-https --yes
echo "deb https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm

On Windows, install via choco:

choco install kubernetes-helm

Install Kubectl (Kubectl)

On macOS, install via brew

brew install kubectl

On Linux, install via curl:

Download the latest release with the command:
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
kubectl version --client

On Windows, install via choco:

choco install kubernetes-cli
kubectl version --client

Download and Install Tekton CLI tool (Tekton CLI tool)

On macOS, install via brew:

brew install tektoncd-cli

On Linux, install via curl:

curl -LO https://github.com/tektoncd/cli/releases/download/v0.21.0/tkn_0.21.0_Linux_x86_64.tar.gz
sudo tar xvzf tkn_0.21.0_Linux_x86_64.tar.gz -C /usr/local/bin/ tkn

On Windows, install via choco:

choco install tektoncd-cli --confirm

Download and Install Lens (Lens)

On macOS, install via brew:

brew install lens

On Linux, install via snap:

sudo snap install Lens-{version}.amd64.snap --dangerous --classic

On Windows,

Download the Windowsx64.exe from https://k8slens.dev/
Once it is downloaded, run the installer Lens-Setup-{version}.exe
By default, Lens is installed under C:\users\{username}\AppData\Local\Programs\Lens

INSTALLATION STEPS

RUN MODELOPS INSTALLER, UNPACK NECESSARY ITEMS

Run the installer for your platform. The installer names are:

Platform Name of the installer
Linux TIB_modelops_1.2.0_linux_x86_64.archive-bin
macOS TIB_modelops_1.2.0_macosx_x86_64.dmg or TIB_modelops_1.2.0_macosx_x86_64.archive-bin
Windows TIB_modelops_1.2.0_win_x86_64.msi

NOTE: As downloaded from the TIBCO distribution site, the ModelOps DMG installer is delivered in a single file stored in zip format.

Agree to the EULA.

Set location for installation (or accept default)

On Windows you may be asked to allow the app from an unknown publisher to make changes to your device. Select “Yes”.

The installer will place a Helm chart and Maven repository in the install directory. You will need both these artifacts when you deploy the product onto your cloud infrastructure. After the installation these items can be located in the following locations by default:

Platform Helm Chart
Linux /opt/tibco/modelops/n.m/helm-charts/kubernetes-installer-1.0.2.tgz
macOS ./TIBCO Streaming ModelOps /n.m/helm-charts/kubernetes-installer-1.0.2.tgz
Windows C:\TIBCO\ modelops\*n.m*\helm-charts\kubernetes-installer-1.0.2.tgz

NOTE: Here, modelops n.m represents the release version, where n represents major release and m represents minor release.

Platform Maven Repository
Linux /opt/tibco/modelops/n.m/maven-repository-artifacts/modelops-repo-1.2.0-mavenrepo.zip
macOS ./TIBCO Streaming ModelOps /n.m/maven-repository-artifacts/modelops-repo-1.2.0-mavenrepo.zip
Windows C:\TIBCO\ modelops\n.m\maven-repository-artifacts\modelos-repo-1.2.0-mavenrepo.zip

LOGIN TO AWS CLI

Login to AWS CLI from command line. Below are two ways of user access to login through AWS CLI.

  1. AWS (Individual) user account having Admin Access.

    • Quick configuration with aws configure (AWS CLI Configuration)
    • To use the CLI you need to use the user access key from your individual account.
    • To get your access key and secret, log into the console using your account.
    • Go to Services and choose “IAM”, On the left go to “Users” and select your user name in the panel on the right, then click on the Security credentials tab. Click on “Create access key” to generate a new key and secret.

    In a terminal, run:

    aws configure --profile %profile name%
    

    e.g.

    aws configure --profile federated
    

    It will prompt you to enter your access key and secret, the default region and default output mode (optional). This will create or update two files in ~/.aws on Linux/Mac or C:\Users<user name>.aws on Windows. The files are called credentials and config. The following example shows sample values. Replace them with your own values as described in the following sections.

    $aws configure --profile federated
    AWS Access Key ID [None]: AKYTUYTFODNN7EXAMPLE
    AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
    Default region name [None]: us-west-1
    Default output format [None]: json
    
    
    aws --version
    
  2. AWS (shared) user account having Assume role access with Administrative role.

    • Assuming you have configured the AWS CLI using above steps and your AWS administrator has provided you the assume role access.

    • Edit the ‘.aws/config’ file and add two extra lines under the new profile:

    1. role_arn : To show the role you want to assume.
    2. source_profile : To show the name of the profile in the credentials file.

      [profile federated] role_arn = arn:aws:iam::ChangeThisWithTheSharedAccountID:role/ChangeThisWithTheRoleName source_profile=SharedAccount output = json region = us-west-2

    • The credentials file should look something like this:

      [federated] aws_access_key_id = YourAccessKey aws_secret_access_key = YourSecretKey

    • To switch to your assume role user, you have to execute below command.

      aws sts assume-role –role-arn “arn:aws:iam::%sharedaccount-id%:role/example-role” –role-session-name %role-seession-name% –duration-seconds 43200

    NOTE: Value of –duration-seconds indicates how long the token would be valid, Max value of –duration-seconds, can be found on AWS console–> IAM –> Role name –> Maximum session duration

    eg.

    aws sts assume-role \
        --role-arn arn:aws:iam::30942343569:role/Administrator \
        --role-session-name Assume-role \
        --profile federated \
        --duration-seconds 43200
    

    Above command outputs several pieces of information. Details inside the credentials block, you need AccessKeyId, SecretAccessKey, and SessionToken.

    {[[A
       "Credentials": {
        "AccessKeyId": "ASIAUQDZSVBWS5ETW7ET",
        "SecretAccessKey": "GrowYTjtvgrMjpdWrW0wDa5YcpOzQRLaUq2e3/Ry+6csZs4",
        "SessionToken": "IQoJb3JpZ2luX2VjEKT//////////wEaCXVzLWVhc3QtMiJHasdasdaMEUCIEEvv0raHD7CahCXa95brWPyKBR8gAu7izrMAiEAsrY9sZOizSn81OZTiU0YqCDhsV79h1MdzvBIqBudLhAqngII/f//////////ARADGgwzMDk0OTI2MjM0NjkiDOEHEsy0mtHDwfAkryryAWPZbcONL+ohB7SfdQrQKcAXukkGNOb0OSLESMLL1pIP5GyPkFGKPyCs/A5dFQnJPp96u2ZuhjDerkkqKrWp0CzJHs3+GyEmfjydnbyDsOvpLm10yT1SnvluNZUf9ZgNwkb2++YxUxBIfUeW3hKHM/aDbQ85X8wswpaQ98X0BgdgV1l68PK+qw2u9b/ndzb3T5Mvw0ic063YJt3ZASZCH593+UK/dFfXOiQc4gUmd8QYYbqbpYSjncJSWoWmO/kwzy7GiEs8DVv2f3Bk9TW4BVL6xg6eACYBCrh+eD7AGq1Fxg8KdkIzgu/I67tytr0Jy5s7MIHdqooGOp0BPwISyZc6Zlq7JO5tZQ0tJNQ1eeaQPRgfppB+/+VfYu+qHiS/BZyWsHYZZ7OS50WBNxcWsLP8FhRpmoIsyCXnfWRkOhdizb5noT7F/p+nVvQp/X7F1o7/2ys9J+kZphLfp4A535fKTqkRi9dtQawJryyHgZ4zR891tO2cVAjQo7jRCFhOC3NxgpFxJmRyFvbiEtx2HkhJZr1fSSDIIQ==",
        "Expiration": "2021-09-22T05:18:09+00:00"
        },
    "AssumedRoleUser": {
        "AssumedRoleId": "AROAJFXHVJ3THL3E5L7TE:agulhane",
        "Arn": "arn:aws:sts::30942343569:assumed-role/Administrator/testuser"
        }
    }
    

    Now, modify the ‘.aws/credentials’ file from your home directory by creating [default] profile, aws_access_key_id, aws_secret_access_key, aws_session_token.

    [default]
    aws_access_key_id = ASIAUQDZSVBWS5ETW7ET
    aws_secret_access_key = GrowYTjtvgrMjpdWrW0wDa5YcpOzQRLaUq2e3/Ry+6csZs4
    aws_session_token = IQoJb3JpZ2luX2VjEKT//////////wEaCXVzLWVhc3QtMiJHasdasdaMEUCIEEvv0raHD7CahCXa95brWPyKBR8gAu7izrMAiEAsrY9sZOizSn81OZTiU0YqCDhsV79h1MdzvBIqBudLhAqngII/f//////////ARADGgwzMDk0OTI2MjM0NjkiDOEHEsy0mtHDwfAkryryAWPZbcONL+ohB7SfdQrQKcAXukkGNOb0OSLESMLL1pIP5GyPkFGKPyCs/A5dFQnJPp96u2ZuhjDerkkqKrWp0CzJHs3+GyEmfjydnbyDsOvpLm10yT1SnvluNZUf9ZgNwkb2++YxUxBIfUeW3hKHM/aDbQ85X8wswpaQ98X0BgdgV1l68PK+qw2u9b/ndzb3T5Mvw0ic063YJt3ZASZCH593+UK/dFfXOiQc4gUmd8QYYbqbpYSjncJSWoWmO/kwzy7GiEs8DVv2f3Bk9TW4BVL6xg6eACYBCrh+eD7AGq1Fxg8KdkIzgu/I67tytr0Jy5s7MIHdqooGOp0BPwISyZc6Zlq7JO5tZQ0tJNQ1eeaQPRgfppB+/+VfYu+qHiS/BZyWsHYZZ7OS50WBNxcWsLP8FhRpmoIsyCXnfWRkOhdizb5noT7F/p+nVvQp/X7F1o7/2ys9J+kZphLfp4A535fKTqkRi9dtQawJryyHgZ4zR891tO2cVAjQo7jRCFhOC3NxgpFxJmRyFvbiEtx2HkhJZr1fSSDIIQ==
    
    [federated]
    aws_access_key_id = AGGHSLISNHSHSGVMSKAD
    aws_secret_access_key = HSnDvdaSDDsdDSDtvgrMjpdWrW0wDa5YcpOzQRLaUq2e3/Ry
    

    Check assume role identity by executing.

    aws sts get-caller-identity   
    

    You should see output such as this:

    {
       "UserId": "AROAKJHJH3THL3E5L7TE:testuser",
        "Account": "30942343569",
        "Arn": "arn:aws:sts::30942343569:assumed-role/Administrator/testuser"
    }    
    

CREATE ELASTIC CONTAINER REGISTRY (ECR)

Default container registry is already created per region in AWS. Login to ECR.

aws ecr get-login-password --region %region-name% | docker login --username AWS --password-stdin %shared-account-id%.dkr.ecr.%region-name%.amazonaws.com 

eg.

aws ecr get-login-password --region us-west-1 | docker login --username AWS --password-stdin 30942343569.dkr.ecr.us-west-1.amazonaws.com

You should see output such as this:

Login Succeeded

NOTE: Here, logging in is an optional step and repositories can be created without logging in.

CREAT REPOSITORY AT ECR

To push the docker images to the ECR, prior that we have to first create respective repositories.

aws ecr create-repository --repository-name install-pipeline --region %region-name%
aws ecr create-repository --repository-name tools --region %region-name%
aws ecr create-repository --repository-name data-channel-registry --region %region-name%
aws ecr create-repository --repository-name file-datasink --region %region-name%
aws ecr create-repository --repository-name file-datasource --region %region-name%
aws ecr create-repository --repository-name git-server --region %region-name%
aws ecr create-repository --repository-name kafka-datasink --region %region-name%
aws ecr create-repository --repository-name kafka-datasource --region %region-name%
aws ecr create-repository --repository-name modelops-metrics --region %region-name%
aws ecr create-repository --repository-name modelops-server --region %region-name%
aws ecr create-repository --repository-name pmml --region %region-name%
aws ecr create-repository --repository-name python --region %region-name%
aws ecr create-repository --repository-name sbrt-base --region %region-name%
aws ecr create-repository --repository-name scheduling-server --region %region-name%
aws ecr create-repository --repository-name scoring-flow --region %region-name%
aws ecr create-repository --repository-name tensorflow --region %region-name%
aws ecr create-repository --repository-name test-datasink --region %region-name%
aws ecr create-repository --repository-name test-datasource --region %region-name%
aws ecr create-repository --repository-name rest-datasink --region %region-name%
aws ecr create-repository --repository-name rest-datasource --region %region-name%
aws ecr create-repository --repository-name statistica --region %region-name%
aws ecr create-repository --repository-name jdbc-datasource --region %region-name%
aws ecr create-repository --repository-name spark --region %region-name%

CREATE AN EKS CLUSTER (with autoscale)

ADDITIONAL VALUES REQUIRED FOR THIS STEP

  • aws_winpassword
  • aws_winuser

The aws_winpassword and aws_winuser values set the admin credentials for any Windows Server containers created on the cluster and must meet Windows Server password requirements. If you don’t specify the windows-admin-password parameter, you will be prompted to provide a value.

Example: aws_winpassword=P@ssw0rd1234567! Example: aws_winuser=azureuser

  • cluster

    You can name your cluster as you wish, keeping in mind the naming conventions described in registering the application. Failure to adhere to this convention will lead to an error such as this when the %cluster% parameter is used to name a DNS domain:

    *a DNS-1123 subdomain must consist of lower case alphanumeric characters, ‘-’ or ‘.’, and must start and end with an alphanumeric character (e.g. ‘example.com’, regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

    Assign values to aws_winpassword, aws_winuser, and cluster. Once you have those values set, issue the following command:

    eksctl create cluster \
        --name %cluster% \
        --version 1.21 \
        --region %region-name% \
        --without-nodegroup \
        --max-pods-per-node 100  
    

This command will takes several minutes. When it completes, you will see a message such as:

2021-09-22 17:43:56 [ℹ]  eksctl version 0.62.0
2021-09-22 17:43:56 [ℹ]  using region us-west-1
2021-09-22 17:43:57 [ℹ]  setting availability zones to [us-west-1b us-west-1a us-west-1a]
2021-09-22 17:43:57 [ℹ]  subnets for us-west-1b - public:192.168.0.0/19 private:192.168.96.0/19
2021-09-22 17:43:57 [ℹ]  subnets for us-west-1a - public:192.168.32.0/19 private:192.168.128.0/19
2021-09-22 17:43:57 [ℹ]  using Kubernetes version 1.21
2021-09-22 17:43:57 [ℹ]  creating EKS cluster "devmodelops" in "us-west-1" region with 
2021-09-22 17:43:57 [ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-west-1 --cluster=devmodelops'
2021-09-22 17:43:57 [ℹ]  CloudWatch logging will not be enabled for cluster "qamodelops" in "us-west-1"
2021-09-22 17:43:57 [ℹ]  you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=us-west-1 --cluster=devmodelops'
2021-09-22 17:43:57 [ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "qamodelops" in "us-west-1"
2021-09-22 17:43:57 [ℹ]  2 sequential tasks: { create cluster control plane "qamodelops", 2 sequential sub-tasks: { wait for control plane to become ready, 1 task: { create addons } } }
2021-09-22 17:43:57 [ℹ]  building cluster stack "eksctl-devmodelops-cluster"
2021-09-22 17:44:00 [ℹ]  deploying stack "eksctl-qamodelops-cluster"
2021-09-22 17:44:30 [ℹ]  waiting for CloudFormation stack "eksctl-devmodelops-cluster"
2021-09-22 17:45:01 [ℹ]  waiting for CloudFormation stack "eksctl-devmodelops-cluster"
2021-09-22 17:55:16 [ℹ]  waiting for CloudFormation stack "eksctl-devmodelops-cluster"
2021-09-22 17:59:34 [ℹ]  waiting for the control plane availability...
2021-09-22 17:59:34 [✔]  saved kubeconfig as "/Users/agulhane/.kube/config"
2021-09-22 17:59:34 [ℹ]  no tasks
2021-09-22 17:59:34 [✔]  all EKS cluster resources for "devmodelops" have been created

ADD NODEGROUP

Create Linux Nodegroup

eksctl create nodegroup \
    --name %cluster%-nodes \
    --nodes-min 2 \
    --nodes-max 4 \
    --node-volume-size 200 \
    --cluster %cluster% \
    --node-type t3.2xlarge \
    --region %region-name%

Create Windows Nodegroup (optional: only required for Statistica scoring service)

eksctl utils install-vpc-controllers \
    --cluster %cluster% \
    --region %region-name% \
    --approve    

eksctl create nodegroup \
    --name %cluster%"-windows-nodes \
    --region %region% \
    --cluster %cluster% \
    --node-ami-family WindowsServer2019FullContainer \
    --node-type t3.2xlarge \
    --nodes-min 2 \
    --nodes-max 4 \
    --node-volume-size 100 \
    --managed=false

This command will takes several minutes. When it completes, you will see a message such as:

2021-09-22 18:01:42 [ℹ]  eksctl version 0.62.0
2021-09-22 18:01:42 [ℹ]  using region us-west-1
2021-09-22 18:01:44 [ℹ]  will use version 1.21 for new nodegroup(s) based on control plane version
2021-09-22 18:01:52 [ℹ]  nodegroup "devmodelops-nodes" will use "" [AmazonLinux2/1.21]
2021-09-22 18:01:55 [ℹ]  1 nodegroup (devmodelops-nodes) was included (based on the include/exclude rules)
2021-09-22 18:01:55 [ℹ]  will create a CloudFormation stack for each of 1 managed nodegroups in cluster "devmodelops"
2021-09-22 18:01:55 [ℹ]  2 sequential tasks: { fix cluster compatibility, 1 task: { 1 task: { create managed nodegroup "devmodelops-nodes" } } }
2021-09-22 18:01:55 [ℹ]  checking cluster stack for missing resources
2021-09-22 18:01:56 [ℹ]  cluster stack has all required resources
2021-09-22 18:01:56 [ℹ]  building managed nodegroup stack "eksctl-devmodelops-nodegroup-devmodelops-nodes"
2021-09-22 18:01:57 [ℹ]  deploying stack "eksctl-devmodelops-nodegroup-devmodelops-nodes"
2021-09-22 18:01:57 [ℹ]  waiting for CloudFormation stack "eksctl-devmodelops-nodegroup-devmodelops-nodes"
2021-09-22 18:02:14 [ℹ]  waiting for CloudFormation stack "eksctl-devmodelops-nodegroup-devmodelops-nodes"
2021-09-22 18:05:11 [ℹ]  waiting for CloudFormation stack "eksctl-devmodelops-nodegroup-devmodelops-nodes"
2021-09-22 18:05:13 [ℹ]  no tasks
2021-09-22 18:05:13 [✔]  created 0 nodegroup(s) in cluster "devmodelops"
2021-09-22 18:05:14 [ℹ]  nodegroup "devmodelops-nodes" has 2 node(s)
2021-09-22 18:05:14 [ℹ]  node "ip-192-168-8-66.us-west-1.compute.internal" is ready
2021-09-22 18:05:14 [ℹ]  node "ip-192-168-82-134.us-west-1.compute.internal" is ready
2021-09-22 18:05:14 [ℹ]  waiting for at least 2 node(s) to become ready in "devmodelops-nodes"
2021-09-22 18:05:15 [ℹ]  nodegroup "devmodelops-nodes" has 2 node(s)
2021-09-22 18:05:15 [ℹ]  node "ip-192-168-8-66.us-west-1.compute.internal" is ready
2021-09-22 18:05:15 [ℹ]  node "ip-192-168-82-134.us-west-1.compute.internal" is ready
2021-09-22 18:05:15 [✔]  created 1 managed nodegroup(s) in cluster "devmodelops"
2021-09-22 18:05:17 [ℹ]  checking security group configuration for all nodegroups
2021-09-22 18:05:17 [ℹ]  all nodegroups have up-to-date configuration

VERIFY NODEGROUP

Execute these commands to get the nodegroup:

eksctl get nodegroup \
    --cluster %cluster% \
    --region %region-name% 

Output from above command should be akin to this:

2021-09-23 16:10:24 [ℹ]  eksctl version 0.62.0
2021-09-23 16:10:24 [ℹ]  using region us-west-1
CLUSTER		NODEGROUP		    	    STATUS		    CREATED			    MIN SIZE MAX SIZE	DESIRED CAPACITY	INSTANCE TYPE	IMAGE ID		ASG NAME
qamodelops	qamodelops-nodes	    	ACTIVE		    2021-09-22T12:32:41Z	2		4		2			t3.2xlarge	AL2_x86_64		eks-qamodelops-nodes-aebe06bf-325a-49da-2b48-45e93aa74cc4
qamodelops	qamodelops-windows-nodes	CREATE_COMPLETE	2021-09-22T12:35:49Z	2		4		2			t3.2xlarge	ami-040fbe9c1ccb48d10	eksctl-qamodelops-nodegroup-qamodelops-windows-nodes-NodeGroup-1DDXS4NTJVTD8

ASSIGN IAM POLICIES

VALUES TO BE EXTRACTED FROM THIS STEP

  • rolename

ADDITIONAL VALUES REQUIRED FOR THIS STEP

Issue the following command to extract rolename:

aws iam list-roles | jq -r '.Roles| .[] | .RoleName' | grep eksctl-%cluster%-nodegroup

You should see output such as this (Two rolenames will be created if windows node is added else only one rolename for linux):

eksctl-devmodelops-nodegroup-devm-NodeInstanceRole-1GV0F2NT37D6U
eksctl-devmodelops-nodegroup-devm-NodeInstanceRole-9OXZ2QBHWDQ

Assign below two IAM Policies to eksctl rolename from the output

Rolename 1:
    aws iam attach-role-policy \
        --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess \
        --role-name eksctl-devmodelops-nodegroup-devm-NodeInstanceRole-1GV0F2NT37D6U

    aws iam attach-role-policy \
        --policy-arn arn:aws:iam::aws:policy/AmazonElasticContainerRegistryPublicFullAccess \
        --role-name eksctl-devmodelops-nodegroup-devm-NodeInstanceRole-1GV0F2NT37D6U

Rolename 2:
    aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess \
        --role-name eksctl-devmodelops-nodegroup-devm-NodeInstanceRole-1GV0F2NT37D6U

    aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonElasticContainerRegistryPublicFullAccess \
        --role-name eksctl-devmodelops-nodegroup-devm-NodeInstanceRole-9OXZ2QBHWDQ

CONFIGURE KUBECTL

    aws eks --region %region-name% \
        update-kubeconfig \
        --name %cluster%

Verify

    kubectl get node

This command runs quickly and should produce output such as:

    NAME                                           STATUS   ROLES    AGE   VERSION
    ip-192-168-1-209.us-west-1.compute.internal    Ready    <none>   40s   v1.21.2-eks-55daa9d
    ip-192-168-8-66.us-west-1.compute.internal     Ready    <none>   10m   v1.21.2-eks-55daa9d
    ip-192-168-82-134.us-west-1.compute.internal   Ready    <none>   10m   v1.21.2-eks-55daa9d
    ip-192-168-86-29.us-west-1.compute.internal    Ready    <none>   40s   v1.21.2-eks-55daa9d

ASSIGN TAINT TO WINDOWS NODE

VALUES TO BE EXTRACTED FROM THIS STEP

  • windowsnode

ADDITIONAL VALUES REQUIRED FOR THIS STEP

This step is only required when windows nodes are added. Issue the following command to extract windows node name:

kubectl get nodes -o wide|grep Windows | awk '{print $1}'

You should see output such as this:

ip-192-168-1-209.us-west-1.compute.internal
ip-192-168-86-29.us-west-1.compute.internal

Assign Tain to below two windows node:

kubectl taint nodes ip-192-168-1-209.us-west-1.compute.internal  os=windows:NoSchedule
kubectl taint nodes ip-192-168-86-29.us-west-1.compute.internal  os=windows:NoSchedule

CREATE MODELOPS NAMESPACE

kubectl create namespace modelops

You should see a message:

“namespace/modelops created”

INSTALL THE SECRETS

ADDITIONAL VALUES REQUIRED FOR THIS STEP

  • elastic_pw
  • git_server_pw
  • nexus_server_pw
  • modelops_server_pw
  • scoring_admin_pw
  • Oauth2 server details

In order to avoid clear text passwords, Kubernetes provides a Secrets facility. So prior to installation, Kubernetes Secrets have to be created to contain the passwords required by ModelOps.

Assign each of the names above to some value, then issue the following commands to set up the secrets.

NOTE: without secrets installed, the helm installation step will be in a wait mode and eventually time out.

Clear out any old secrets that may exist with this series of delete secret commands:

kubectl delete secret git-server --namespace modelops
kubectl delete secret nexus-server --namespace modelops
kubectl delete secret modelops-server --namespace modelops
kubectl delete secret scoring-admin --namespace modelops
kubectl delete secret oauth2 --namespace modelops

Create new secrets:

kubectl create secret generic elasticsearch-es-elastic-user \
    --from-literal=elastic=%elastic_pw% \
    --namespace modelops --dry-run=client --output=yaml 2>/dev/null > secret.yaml
kubectl apply --filename secret.yaml
kubectl create secret generic git-server \
    --from-literal=modelops=%git_server_pw% \
    --namespace modelops
kubectl create secret generic nexus-server \
    --from-literal=admin=%nexus_server_pw% \
    --namespace modelops
kubectl create secret generic modelops-server \
    --from-literal=admin=%modelops_server_pw% \
    --namespace modelops
kubectl create secret generic scoring-admin \
    --from-literal=admin=%scoring_admin_pw% \
    --namespace modelops

The oauth2 secret depends on the type of authentication server used.

For Azure, the authentication server administrator should supply the Azure tenant id (%azure_tenant_id%), app id (%azure_app_id%) and client secret (%azure_client_secret%). The secret is created with:

kubectl create secret generic oauth2 \
    --from-literal=TENANT_ID=%azure_tenant_id% \
    --from-literal=CLIENT_ID=%azure_app_id% \
    --from-literal=CLIENT_SECRET=%azure_client_secret% \
    --namespace modelops

For Cognito, the authentication server administrator should supply the Cognito region (%cognito_region%), pool id (%cognito_pool_id%), client id (%cognito_client_id%), client secret (%cognito_client_secret%) and the domian (%cognito_domain%). The secret is created with:

kubectl create secret generic oauth2 \
    --from-literal=REGION=%cognito_region% \
    --from-literal=POOL_ID=%cognito_pool_id% \
    --from-literal=CLIENT_ID=%cognito_client_id% \
    --from-literal=CLIENT_SECRET=%cognito_client_secret% \
    --from-literal=DOMAIN=%cognito_domain% \
    --namespace modelops

INSTALL THE HELM CHART

ADDITIONAL VALUES REQUIRED FOR THIS STEP

  • hostedzone

    A real-world value would be a domain that you bought from a domain name registrar. Eg: xyzcloud.com

  • modelops_home

    The directory where ModelOps is installed on your machine. For instance, on windows, c:\tibco\modelops\n.m

  • name

    This is NOT the display name of the application registered at the beginning of this process. Rather, this is usually set to be the same as %cluster%

  • domain

    Assign to this the combination of existing parameters: %cluster%.%domain%

  • network_exposure

    Assign to this the string “ingress” (without the quotes)

  • oauth2_server_type

    One of azure or cognito to use an external Oauth2 authentication server.
    Leave blank if no authentication server is required.

  • eks-role-arn

    If the optional eks.externalDNS is set to aws, eks-role-arn must be a valid aws role with permissions to update route53. See external-dns for more information.

Optional: If you wish to display the helm chart, run this command:

helm show values kubernetes-installer-1.0.2.tgz

Assign each of the names above to an appropriate value, then issue the following command:

helm upgrade \
    --install installer %modelops_home%/helm-charts/kubernetes-installer-1.0.2.tgz \
    --atomic \
    --set cloud=eks \
--set eks.externalDNS=aws \
--set externalDNS.aws.eksRoleArn=%eks-role-arn% \
    --set eks.containerRegistry=%shared-account-id%.dkr.ecr.%region-name%.amazonaws.com \
    --namespace modelops \
    --set eks.networkExposure=%network_exposure% \
    --set eks.ingressDomain=%domain% \
    --set eks.oauth2=%outh2_server_type% \
    --timeout 10m0s

The above command will produce a series of warnings, user can ignore those warnings. Then a series of lines of output, including a thank you, and ending with a note for how to track the progress of the installation pipeline. See monitoring the installation for more details.

COPY MAVEN ARTIFACTS (using kubectl)

On Windows, you must change directory to the maven artifacts subdirectory of your model ops installation to run this command.

cd c:\tibco\modelops\\*n.m*\maven-repository-artifacts

On Mac/Linux you can give a fully qualified path name to the mavenrepo.zip file.

kubectl cp \
    modelops-repo-1.0.2-mavenrepo.zip \
    mavenrepo-0:/tmp/ \
    --namespace modelops

This command takes some time to run, and gives no output.

MONITOR INSTALLATION PROCESS (with tkn)

tkn pipelinerun logs bootstrap --follow --namespace modelops

This pipelinerun command takes SOME time to complete and gives copious amounts of output. When this command completes, you can then ask to see the task list:

tkn taskrun list --namespace modelops

UPDATE DNS

This step is only required if automatic updating of DNS is not enabled with the eks.externalDNS installation option.

VALUES TO BE EXTRACTED FROM THIS STEP

  • ingress_lb_ip
  • hosted_zone_id

ADDITIONAL VALUES REQUIRED FOR THIS STEP

  1. Issue the following command to extract ingress_lb_ip:
    kubectl get service/nginx-ingress-ingress-nginx-controller -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' --namespace modelops
    

You should see output such as this:

    a754a322be44b4d68b27eeewwwe103ce583-731052421.us-west-1.elb.amazonaws.com%

Assign the output of the above command to ingress_lb_ip (without % sign) Example: set ingress_lb_ip=a754a322be44b4d68b27eeewwwe103ce583-731052421.us-west-1.elb.amazonaws.com

  1. Issue the following command to extract hosted_zone_id:
    aws route53 list-hosted-zones-by-name | jq --arg name "${Domain}." -r '.HostedZones | .[] | select(.Name=="\($name)") | .Id'
    

You should see output such as this:

    /hostedzone/Z0415WEWR1C17NACM6JCG

Assign the output of the above command to hosted_zone_id Example: set hosted_zone_id=/hostedzone/Z0415WEWR1C17NACM6JCG

Once ingress_lb_ip and hosted_zone_id parameter is set, modify the record-sets by issuing the following command:

aws route53 change-resource-record-sets \
        --hosted-zone-id ${hosted_zone_id}  \
        --change-batch '{ "Comment": "Updating a record",
                          "Changes": [ {
                                "Action": "UPSERT",
                                "ResourceRecordSet": {
                                    "Name": "modelops-server.'"$domain"'",
                                    "Type": "CNAME",
                                    "TTL": 300,
                                    "ResourceRecords": [ {
                                        "Value": "'"$ingress_lb_ip"'" } ] } } ] }'

You should see output such as this:

    {
       "ChangeInfo": {
        "Id": "/change/C08087523NGIM5VVMOYKU",
        "Status": "PENDING",
        "SubmittedAt": "2021-10-13T13:50:20.565000+00:00",
        "Comment": "Updating a record"
        }
    }
  1. If Hosted zone is not present then first you have to create it.
    aws route53 create-hosted-zone \
        --name ${domain} \
        --caller-reference $RANDOM \
        --hosted-zone-config \
        Comment="Creating HostedZone"
    

    Issue the following command to extract hosted_zone_id:

    aws route53 list-hosted-zones-by-name | jq --arg name "${Domain}." -r '.HostedZones | .[] | select(.Name=="\($name)") | .Id'
    

You should see output such as this:

    /hostedzone/Z0415WEWR1C17NACM6JCG

Assign the output of the above command to hosted_zone_id Example: set hosted_zone_id=/hostedzone/Z0415WEWR1C17NACM6JCG

Once your ingress\_lb\_ip and hosted_zone_id parameter is set, modeify the record-sets by issuing the following command:
     
    aws route53 change-resource-record-sets \
        --hosted-zone-id ${hosted_zone_id}  \
         --change-batch '{ "Comment": "Creating a record",
                          "Changes": [ {
                                "Action": "CREATE",
                                "ResourceRecordSet": {
                                    "Name": "modelops-server.'"$domain"'",
                                    "Type": "CNAME",
                                    "TTL": 300,
                                    "ResourceRecords": [ {
                                        "Value": "'"$ingress_lb_ip"'" } ] } } ] }'  

You should see output such as this:

    {
       "ChangeInfo": {
        "Id": "/change/C04087802ADO55OR1VNCN",
        "Status": "PENDING",
        "SubmittedAt": "2021-10-12T07:53:21.930000+00:00",
        "Comment": "Creating a record"
        }
    }

If API v1 integration is required to Team Studio, the modelops-1 host should also be added to DNS in a similar way:

    aws route53 change-resource-record-sets \
        --hosted-zone-id ${hosted_zone_id}  \
         --change-batch '{ "Comment": "Creating a record",
                          "Changes": [ {
                                "Action": "CREATE",
                                "ResourceRecordSet": {
                                    "Name": "modelops-1.'"$domain"'",
                                    "Type": "CNAME",
                                    "TTL": 300,
                                    "ResourceRecords": [ {
                                        "Value": "'"$ingress_lb_ip"'" } ] } } ] }'  

CLUSTER MANAGEMENT

This action will stop your control plane and agent nodes altogether, allowing you to save on all the compute costs, while maintaining all your objects and cluster state stored for when you start it again. You can then pick up right where you left off after a weekend, or only run your cluster while you run your batch jobs.

Limitations

When using the cluster start/stop feature, the following restrictions apply:

  • This feature is only supported for Virtual Machine Scale Sets backed clusters.
  • The cluster state of a stopped AKS cluster is preserved for up to 12 months. If your cluster is stopped for more than 12 months, the cluster state cannot be recovered.
  • You can only start or delete a stopped AKS cluster. To perform any operation like scale or upgrade, start your cluster first.
  • The customer provisioned PrivateEndpoints linked to private cluster need to be deleted and recreated again when you start a stopped AKS cluster.

Stop Cluster

You can use the eksctl command to stop a running EKS cluster’s nodes. The following example stops a cluster:

eksctl scale nodegroup --cluster %cluster% --name %cluster%-nodes --nodes 0 --nodes-max 1 --nodes-min 0

Start Cluster

You can use the eksctl command to start a stopped AKS cluster’s nodes and control plane. The cluster is restarted with the previous control plane state and number of agent nodes.

The following example start a cluster:

eksctl scale nodegroup --cluster %cluster% --name %cluster%-nodes --nodes 2 --nodes-max 4 --nodes-min 2

NOTE: If the provisioningState shows Starting that means your cluster hasn’t fully started yet. It takes around 6-7 minutes for the cluster to completely provision into the Succeeded status which ensure that your cluster is up and running.

CLEANUP

Use of the AWS services costs money. If you are not using your services any longer, you should clean up and remove them. For these delete commands to succeed, you must use the name that you created in your creation steps. To check what names you have, you have two options:

a) log in via command line and issue the following command:

  aws eks list-clusters --region %region-name%

The output from this option should look like this:

{
    "clusters": [
        "devmodelops",
        "sbxmodelops"
    ]
}

b) log into the portal and choose Kubernetes services:

Once you have your name set correctly, detach the assigned policies:

VALUES TO BE EXTRACTED FROM THIS STEP

  • rolename

Issue the following command to extract rolename:

aws iam list-roles | jq -r '.Roles| .[] | .RoleName' | grep eksctl-%cluster%-nodegroup

You should see output such as this (Two rolenames will be displayed if windows node is added else only one rolename for linux):

eksctl-devmodelops-nodegroup-devm-NodeInstanceRole-1GV0F2NT37D6U
eksctl-devmodelops-nodegroup-devm-NodeInstanceRole-9OXZ2QBHWDQ

Detach below two IAM Policies from eksctl rolename

Rolename 1:
    aws iam detach-role-policy \
        --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess \
        --role-name eksctl-devmodelops-nodegroup-devm-NodeInstanceRole-1GV0F2NT37D6U

    aws iam detach-role-policy \
        --policy-arn arn:aws:iam::aws:policy/AmazonElasticContainerRegistryPublicFullAccess \
        --role-name eksctl-devmodelops-nodegroup-devm-NodeInstanceRole-1GV0F2NT37D6U

Rolename 2:
    aws iam detach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess \
        --role-name eksctl-devmodelops-nodegroup-devm-NodeInstanceRole-1GV0F2NT37D6U

    aws iam detach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonElasticContainerRegistryPublicFullAccess \
        --role-name eksctl-devmodelops-nodegroup-devm-NodeInstanceRole-9OXZ2QBHWDQ

To delete the Cluster resources issue the following commands:

eksctl delete cluster \
    --name %cluster%  \
    --region %region-name% \
    --wait \
    --force

VARIABLE REFERENCE

Variable Source of Value Description
ecr AWS Elastic Container Registry (ECR) which provides cloud-based container image building for platforms including Linux, Windows. Use Elastic container registries with your existing container development and deployment pipelines, or use Elastic Container Registry Tasks to build container images in AWS. This is a user selected meaningful name which the user creates as explained below while creating ECR.
cluster Elastic Kubernetes Service (EKS) is a managed Kubernetes service that lets users quickly deploy, scale and manage clusters. It reduces the complexity for deployment and core management tasks, including coordinating upgrades. The EKS control plane is managed by the AWS platform, and user only pay for the EKS nodes that run your applications. EKS is built on top of the open-source AWS Kubernetes Service Engine. The user gives the cluster name while creating EKS.
namespace Kubernetes resources, such as pods and Deployments, are logically grouped into a namespace. These groupings provide a way to logically divide an EKS cluster and restrict access to create, view, or manage resources. Users can create namespaces to separate business groups At the time of namespace creation an user can give any value to the namespace eg development, production etc
location When the user create an organization, user can choose the region your organization is hosted in AWS. You may choose your organization’s region based on locality and network latency, or because you have sovereignty requirements for data centers. Your organization’s default location is based on the closest Microsoft Azure region available. Eg: eastus
aws_winpassword and aws_winuser These values set the admin credentials for any Windows Server containers created on the cluster and must meet Windows Server password requirements. If you don’t specify the windows-admin-password parameter, you will be prompted to provide a value. Eg: azure_winpassword=P@ssw0rd1234567! and azure_winuser=azureuser
hostedzone HostedZone is the name of the DNS which was used to create a hostedzones in Rout53. The Hostedzone name can be any value that is not already configured on the AWS Rout53. A real-world value would be a domain that you bought from a domain name registrar. Eg: xyzcloud.com
Domain Domain is the public DNS exposed to access ModelOps service. The DNS registered in Hostedzone for the respective EKS. Eg: devmodelops.tibcocloud.com
NetworkExposure It is the type in which user requires the k8s service to be exposed. ModelOps is currently using “ingress”. Eg: ingress, loadBalancer, etc.
region-name It is the AWS region in which EKS service is deployed i.e us-east-2

TROUBLESHOOTING REFERENCE

Helm install failure on pre-install

Error: an error occurred while uninstalling the release. original install error:
failed pre-install: timed out waiting for the condition: timed out waiting for the condition

This error indicates that a pre-install tasks failed - this part of the install process installs the operators, so chances are something failed in there. The command kubectl logs job/operators -n modelops should give some more info.

One possible cause of this error is a problem installing secrets.

Helm install failure

Error: rendered manifests contain a resource that already exists.  Unable to continue with install:
Namespace "production" in namespace "" exists and cannot be imported into the current release:
invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to
"Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "modelops";
annotation validation error: missing key "meta.helm.sh.release-namespace": must be set to "production"

This error is caused by using a namespace value of “production”.

The ModelOps product reserves these namespaces and thus those names are not available for use:

  • development
  • testing
  • production
  • datachannels
  • modelops

Copy Maven Artifacts

On Windows, without the change directory, you can expect this error:

error: modelops-repo-1.2.0-mavenrepo.zip doesn't exist in local filesystem

On Windows an attempt to give fully qualified path name to the mavenrep.zip on the C drive can expect this error:

error: one of src or dest must be a local file specification

Resolving DNS records

The following steps help you investigate why DNS resolution is failing for a DNS record in a hosted zone in AWS Route53.

  • Confirm that the DNS records have been configured correctly in AWS Route53. Review the DNS records in the AWS portal, checking that the Hostedzone name, record name, and record type are correct.
  • Be sure to specify the correct name servers for your Hosted zone, as shown in the AWS portal.
  • Check that the DNS name is correct (you have to specify the fully qualified name, including the zone name) and the record type is correct.
  • Confirm that the DNS domain name has been correctly delegated to the AWS Route53 name servers. More information on delegation can be found in the AWS documentation available here.

HELM CHART REFERENCE

$ helm show values kubernetes-installer-1.0.2.tgz
#
# Default values for the chart
#

#
# cloud environment
#
cloud: docker-for-desktop

#
# image pull policy
#
pullpolicy:           "IfNotPresent"

#
# sizing
#
size: medium

#
# operator lifecycle manager specific settings
#
olm:
  operatorVersion:    "v0.17.0"

#
# tekton specific settings
#
tekton:
  operatorVersion:    "latest"

#
# nexus specific settings
#
nexus:
  operatorVersion:    "v0.6.0"
  internalPort:       80
  nodePort:           30020
  containerNodePort:  30030
  hostname:           "artifact-repository"
  maven:
    maven-proxy:
      url:            "https://repo1.maven.org/maven2/"
  pypi:
    pypi-proxy:
      url:            "https://pypi.org/"
  yum:
    yum-proxy:
      url:            "https://repo.almalinux.org/almalinux"
 
#
# The following values are defaulted depending on cloud type :
#
# installOLM - install the operator lifecycle manager
#
# containerRegistry - base URI of container registry.  Use the supplied one 
#   if available.
#
# containerUsername/containerPassword - if set, used to access container registry
#
# networkExposure - mechanism to use to expose network
#
# createPVC - if true create persistent volume claim in helm chart, if false 
#   the persistent volume claim must be created before installing the chart.
#
# selfSignedRegistry - if true then skip tls verification on registry
#
# httpRegistry - if true then use http registry
#
# roleBasedAccessControl - kubernetes or openshift
#
# windows - if true build windows container (currently statistica scoring server)
#
# dnsSuffix - AKS only, set azure annotation for pubic dns name, ie <container>-<dnsSuffix>.<region>.cloudapp.azure.com
#

docker-for-desktop:
  installOLM:         true
  installMetrics:     true
  installLogs:        true
  containerRegistry:  "localhost:5000"
  networkExposure:    "nodePort"
  createPVC:          true
  httpRegistry:       true
  selfSignedRegistry: false
  roleBasedAccessControl: "kubernetes"
  windows:            false
  ingressDomain:      "tobeset"

kind:
  installOLM:         true
  installMetrics:     true
  installLogs:        true
  containerRegistry:  "kind-registry:5000"
  networkExposure:    "ingress"
  createPVC:          true
  selfSignedRegistry: false
  httpRegistry:       true
  roleBasedAccessControl: "kubernetes"
  windows:            false
  ingressDomain:      "tobeset"

colima:
  installOLM:         true
  installMetrics:     true
  installLogs:        true
  containerRegistry:  "localhost:5000"
  networkExposure:    "nodePort"
  createPVC:          true
  httpRegistry:       true
  selfSignedRegistry: false
  roleBasedAccessControl: "kubernetes"
  windows:            false
  ingressDomain:      "tobeset"

openshift:
  installOLM:         false
  installMetrics:     true
  installLogs:        true
  containerRegistry:  "image-registry.openshift-image-registry.svc:5000/{{ .Release.Namespace }}"
  networkExposure:    "route"
  createPVC:          true
  selfSignedRegistry: true
  httpRegistry:       false
  roleBasedAccessControl: "openshift"
  windows:            false
  ingressDomain:      "tobeset"

aks:
  installOLM:         true
  installMetrics:     true
  installLogs:        true
  containerRegistry:  "myregistry.azurecr.io"
  containerUsername:  "azure appid"
  containerPassword:  "azure password"
  azureTenantId:      "azure tenantId"
  networkExposure:    "ingress"
  createPVC:          true
  selfSignedRegistry: false
  httpRegistry:       true
  roleBasedAccessControl: "kubernetes"
  windows:            true
  ingressDomain:      "tobeset"
  # oauth2:             "azure"

eks:
  installOLM:         true
  installMetrics:     true
  installLogs:        true
  containerRegistry:  "eks registry"
  region:             "region"
  networkExposure:    "ingress"
  createPVC:          true
  selfSignedRegistry: false
  httpRegistry:       true
  roleBasedAccessControl: "kubernetes"
  windows:            false
  ingressDomain:      "tobeset"
  # oauth2:             "cognito"

#
# sizing details
#
small:
  general:
    cpu: "2"
    memory: "400Mi"
  nexus:
    disk: "20Gi"
    memory: "2Gi"
  elasticsearch:
    disk: "10Gi"
    memory: "2Gi"
  prometheus:
    interval: "30s"
    disk: "10Gi"

medium:
  general:
    cpu: "2"
    memory: "400Mi"
  nexus:
    disk: "20Gi"
    memory: "2Gi"
  elasticsearch:
    disk: "50Gi"
    memory: "5Gi"
  prometheus:
    interval: "10s"
    disk: "50Gi"

large:
  general:
    cpu: "2"
    memory: "400Mi"
  nexus:
    disk: "20Gi"
    memory: "2Gi"
  elasticsearch:
    disk: "100Gi"
    memory: "10Gi"
  prometheus:
    interval: "10s"
    disk: "100Gi"

#
# hence the chart may be installed :
#
#   helm install installer kubernetes-installer-[version].tgz --set cloud=openshift
#
# or override individual settings
#
#   helm install installer kubernetes-installer-[version].tgz --set cloud=openshift --set openshift.createPVC=true
#

#
# Kubernetes DNS domain - not generally used but needed for windows work-arounds (see TMO-1156)
#
clusterName:              "svc.cluster.local"

#
# prometheus specific settings
#
# if storageClass is set, use storageClass in volumeClaimTemplate (otherwise system defult is used)
#
# See https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects for retention time
#
prometheus:
  operatorVersion:    "30.0.1"
  nodePort:           30050
  retention:          "1y"
  storageClass:       ""
# see https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#alertmanagerconfigspec
#  alerts:
#    route:
#      groupBy: ['job']
#      receiver: "test"
#    receivers:
#    - name: "test"
#      emailConfigs:
#      - to: plord@tibco.com
#        from: plord@tibco.com
#        smarthost: smtp-relay.gmail.com:587

#
# elasiticsearch specific settings
#
elasticsearch:
  operatorVersion:    "1.9.1"
  version:            "7.16.2"
  nodePort:           30070
  username:           "elastic"

#
# kibana specific settings
#
kibana:
  version:            "7.16.2"
  nodePort:           30080
  operatorVersion:    "1.9.1"

#
# ingress nginx specific settings
#
ingressnginx:
  version:            "4.0.1"

#
# cert manager specific settings
#
certmanager:
  version:            "v1.6.1"

#
# Oauth2
# 
oauth2:

  azure:
    # oauth2 values for azure
    #
    # need a secret "oauth2" with 
    #
    # TENANT_ID set to azure tenantid
    # CLIENT_ID set to azure application id
    # CLIENT_SECRET set to azure client secret
    #
    identityAttributeName:    "unique_name"
    roleAttributeName:        "roles"
    jwtAudience:              "${CLIENT_ID}"
    jwtIssuer:                "https://sts.windows.net/${TENANT_ID}/"
    jwksURL:                  "https://login.microsoftonline.com/common/discovery/keys"
    jwksCacheTimeoutSeconds:  "3600"
    ssoLogoutURL:             "https://login.microsoftonline.com/${TENANT_ID}/oauth2/logout?post_logout_redirect_uri=https://modelops-server.${MODELOPS_DOMAIN}/oauth2/sign_out"
    # oauth2-proxy settings - see https://oauth2-proxy.github.io/oauth2-proxy/docs/
    provider:                 "azure"
    emailclaim:               "unique_name"
    azuretenant:              "${TENANT_ID}"
    oidcissuerurl:            "https://sts.windows.net/${TENANT_ID}/"
    extrajwtissuers:          "https://login.microsoftonline.com/${TENANT_ID}/v2.0=${CLIENT_ID}"
    clientid:                 "${CLIENT_ID}"
    clientsecret:             "${CLIENT_SECRET}"
    whitelist:                "login.microsoftonline.com/${TENANT_ID}"

  cognito:
    # oauth2 values for amazon cognito
    #
    # need a secret "oauth2" with 
    #
    # REGION set to cognito region
    # POOL_ID set to cognito pool id
    # CLIENT_ID set to cognito client id
    # CLIENT_SECRET set to cognito client secret
    # DOMAIN set to cognito domain
    #
    identityAttributeName:    "email"
    roleAttributeName:        "cognito:groups"
    jwtAudience:              "${CLIENT_ID}"
    jwtIssuer:                "https://cognito-idp.${REGION}.amazonaws.com/${POOL_ID}"
    jwksURL:                  "https://cognito-idp.${REGION}.amazonaws.com/${POOL_ID}/.well-known/jwks.json"
    jwksCacheTimeoutSeconds:  "3600"
    ssoLogoutURL:             "https://${DOMAIN}.auth.${REGION}.amazoncognito.com/logout?client_id=${CLIENT_ID}&logout_uri=https://modelops-server.${MODELOPS_DOMAIN}/oauth2/sign_out"
    # oauth2-proxy settings - see https://oauth2-proxy.github.io/oauth2-proxy/docs/
    provider:                 "oidc"
    emailclaim:               "email"
    oidcissuerurl:            "https://cognito-idp.${REGION}.amazonaws.com/${POOL_ID}"
    clientid:                 "${CLIENT_ID}"
    clientsecret:             "${CLIENT_SECRET}"
    whitelist:                "tibco-modelops.auth.${REGION}.amazoncognito.com"