Cluster Sizing Summary

This section represents cluster sizing recommendations based on various tests performed on TIBCO Cloud™ API Management - Local Edition 5.x. Some tests used default deployments scripts shipped with API Management - Local Edition, while other tests used customized deployment files (not part of default scripts). Customizations included running only Traffic Managers on dedicated nodes in a cluster, while other components were on remaining nodes in the same cluster. The following tables can be referenced as a guideline for creating K8S clusters as per your high performance requirements, expressed as Transactions per Second (TPS), etc. These tests were performed on Local Edition 5.x with variations in response size and latency of the backend.

Test - Part 1

All the nodes in each cluster were 4 core CPU and 15 GB memory.
Load Generation: 8 Jmeter hosts in US West1b (GCP)
Backend: 8 Latency Injector hosts in US West1b
Local Edition cluster Region: US Central1 (GCP).
These tests were performed when TMs used getStats method on memcache servers for time synchronization(default behavior, as of Local Edition 5.3.1).

Response Size (Latency) → Cluster Type ↓	2b (0 ms)	1kb (100 ms)	256kb (500 ms)	1-64kb (100- 300 ms)	1-8kb (30-180 ms)	4-128kb (100-300 ms)
Response Size (Latency) → Cluster Type ↓	Unprotected \| Protected (OAuth)
Xtra Small	734 \| 781	723 \| 571	265 \| 229	688 \| 610	761 \| 545	647 \| 385
Small-1	1920 \| 1800	1600 \| 1460	692 \| 590	1470 \| 1200	1850 \| 1320	1420 \| 1300
Small-2	1300 \| 1230	1270 \| 1100	366 \| 283	1000 \| 913	1330 \| 733	1000 \| 723
Medium-1	2300 \| 1600	2100 \| 1900	909 \| 683	1870 \| 1470	2100 \| 2000	1950 \| 1850
Medium-2	4400 \| 3300	3500 \| 3300	1300 \| 12	3800 \| 3400	4200 \| 2500	3500 \| 3000
Large-1	2300 \| 2200	1800 \| 1470	1420 \| 1100	2500 \| 2300	2300 \| 1700	1750 \| 1700
Large-2	4000 \| 3700	3850 \| 3300	1500 \| 1400	3500 \| 3000	4000 \| 3380	3000 \| 2900

Description of the Topology

Topology	Description	NoSQL Count	Configuration Manager Count	Log Count	SQL Count	Cache Count	TM Count
Xtra Small	No of K8S worker Node -1	1	1	1	1	1	1
Small	No of K8S worker Node -2	1	1	1	1	2	3
Small-2	No of K8S worker Node - 2 One Node dedicated for Traffic manager while remaining containers running on another node.	1	1	1	1	1	1
Medium-1	No of K8S worker Node - 3	3 (1 per node)	1	2 (max 1 per node)	1	3 (1 per node)	10 (max 4 per node)
Medium-2	Same as Medium-1. But each node has double the capacity, i.e. 8 core and 30 GB	3 (1 per node)	1	2 (max 1 per node)	1	3 (1 per node)	10 (max 4 per node)
Medium-3	Similar to medium-2 cluster of 3 nodes with 3 TMs (same as medium-1), but each node is 2 core and 8GB. This test has been done to get no for licensing for total of 6 core.	3 (1 per node)	1	2 (max 1 per node)	1	3 (1 per node)	3 (max 4 per node)
Large-1	No of K8S worker Node - 5	3 (max 1 er node)	1	5 (max 1 per node)	1	3 (max 1 per node)	20 (max 4 per node)
Large-2	No of K8S worker Node - 6 3 nodes dedicated to all the 15 TMs. All remaining components running on remaining 3 nodes.	3 (max 1 per node)	1	2 (max 1 per node)	1	3 (max 1 per node)	15 (max 5 per node)

Test - Part 2

In the first part of the test, a limit of 20-21K TPS was reached for the extra large cluster. TPS was not increasing linearly as the cluster scaled horizontally. On further analysis, it was determined that for each request, each TM was connecting to each connected memcache server to get time reference for quota enforcement which created a bottleneck. Better throughput is available if the system time of TMs is used and by making sure that all the servers are in sync (using NTP or through other mechanism). The property to switch between these two mechanisms is available in TM but it is not exposed via TM property file. This is achieved by using the tml_tm_properties.json deployment property file. To use the cache servers as a shared time reference, set the use_system_time property value to false.

Create a k8s secret out of this property file and overwrite the existing template inside the TM container. With this tweak, a TPS of 110K-115K was achieved for a specific cluster. In this series of tests, extra large type of clusters were created with varying nodes from 8 to 30. In each of the cluster, TMs were run separately on dedicated nodes, log containers on dedicated nodes and other type of containers shared the remaining nodes. No two nosql or cache or log or TMs were running on the same node using K8S anti-affinity feature. CPU utilization was around 55-60% on TM nodes when reaching these max TPS. The details of the tests are shown in the following table.

Cluster	TPS (Unprotected)
Extra Large -1 10 Nodes. 5 TMs - each on a single node dedicated for TM 2 Logs - each on single node dedicated for log 3 NoSQL - each on separate node 3 Cache - each on separate node but shared with NoSQL 1 SQL - On a node shared with NoSQL/cache 1 CM - On a node shared with NoSQL/cache	40000
Extra Large -2 15 Nodes. 8 TMs - each on a single node dedicated for TM 2 Logs - each on single node dedicated for log 3 NoSQL - each on separate node 3 Cache - each on separate node but shared with NoSQL 1 SQL - On a node shared with NoSQL/cache 1 CM - On a node shared with NoSQL/cache	55000
Extra Large -3 20 Nodes. 13 TMs - each on a single node dedicated for TM 2 Logs - each on single node dedicated for log 3 NoSQL - each on separate node 5 Cache - each on separate node but shared with NoSQL 1 SQL - On a node shared with NoSQL/cache 1 CM - On a node shared with NoSQL/cache	85,000
Extra Large -4 27 Nodes. 20 TMs - each on a single node dedicated for TM 2 Logs - each on single node dedicated for log 3 NoSQL - each on separate node 5 Cache - each on separate node but shared with NoSQL 1 SQL - On a node shared with NoSQL/cache 1 CM - On a node shared with NoSQL/cache	110,000

The deployment files were customized so that each TM and log was running on a single and separate node while NoSQL, SQL, and CM caches shared the separate nodes. However, while sharing nodes, each NoSQL and cache were running on separate nodes. The pod's anti affinity rules were combined with node label. Each cluster was divided into in three groups. One group of nodes were labeled with label (deploy = tm) where only TMs were deployed, the second group of nodes were labeled as (deploy = log) where only log was deployed while the third group of nodes were labeled as (deploy = other) for running remaining components. We used these labels in corresponding deployment files under the nodeSelector attribute. Contact Local Edition Support for the customized deployment file. This is a compressed system folder usually found in the deployment folder, which has updated/customized yaml files.

Response Size (Latency) → Cluster Type ↓	2b (0 ms)	1kb (100 ms)	256kb (500 ms)	1-64kb (100- 300 ms)	1-8kb (30-180 ms)	4-128kb (100-300 ms)
Response Size (Latency) → Cluster Type ↓	Unprotected \| Protected (OAuth)
Xtra Small	734 \| 781	723 \| 571	265 \| 229	688 \| 610	761 \| 545	647 \| 385
Small-1	1920 \| 1800	1600 \| 1460	692 \| 590	1470 \| 1200	1850 \| 1320	1420 \| 1300
Small-2	1300 \| 1230	1270 \| 1100	366 \| 283	1000 \| 913	1330 \| 733	1000 \| 723
Medium-1	2300 \| 1600	2100 \| 1900	909 \| 683	1870 \| 1470	2100 \| 2000	1950 \| 1850
Medium-2	4400 \| 3300	3500 \| 3300	1300 \| 12	3800 \| 3400	4200 \| 2500	3500 \| 3000
Large-1	2300 \| 2200	1800 \| 1470	1420 \| 1100	2500 \| 2300	2300 \| 1700	1750 \| 1700
Large-2	4000 \| 3700	3850 \| 3300	1500 \| 1400	3500 \| 3000	4000 \| 3380	3000 \| 2900

Contents

Index

Search Results

Cluster Sizing Summary

Test - Part 1

Test - Part 2