Cluster Sizing Summary

This section represents cluster sizing recommendations based on various tests performed on TIBCO Cloud™ API Management - Local Edition 5.x. Some tests used default deployments scripts shipped with API Management - Local Edition, while other tests used customized deployment files (not part of default scripts). Customizations included running only Traffic Managers on dedicated nodes in a cluster, while other components were on remaining nodes in the same cluster. The following tables can be referenced as a guideline for creating K8S clusters as per your high performance requirements, expressed as Transactions per Second (TPS), etc. These tests were performed on Local Edition 5.x with variations in response size and latency of the backend.

Test - Part 1

  1. All the nodes in each cluster were 4 core CPU and 15 GB memory.
  2. Load Generation: 8 Jmeter hosts in US West1b (GCP)
  3. Backend: 8 Latency Injector hosts in US West1b
  4. Local Edition cluster Region: US Central1 (GCP).
  5. These tests were performed when TMs used getStats method on memcache servers for time synchronization(default behavior, as of Local Edition 5.3.1).
Response Size (Latency) →

Cluster Type ↓

2b (0 ms) 1kb (100 ms) 256kb (500 ms) 1-64kb (100- 300 ms) 1-8kb (30-180 ms) 4-128kb (100-300 ms)
Unprotected | Protected (OAuth)
Xtra Small 734 | 781 723 | 571 265 | 229 688 | 610 761 | 545 647 | 385
Small-1 1920 | 1800 1600 | 1460 692 | 590 1470 | 1200 1850 | 1320 1420 | 1300
Small-2 1300 | 1230 1270 | 1100 366 | 283 1000 | 913 1330 | 733 1000 | 723
Medium-1 2300 | 1600 2100 | 1900 909 | 683 1870 | 1470 2100 | 2000 1950 | 1850
Medium-2 4400 | 3300 3500 | 3300 1300 | 12 3800 | 3400 4200 | 2500 3500 | 3000
Large-1 2300 | 2200 1800 | 1470 1420 | 1100 2500 | 2300 2300 | 1700 1750 | 1700
Large-2 4000 | 3700 3850 | 3300 1500 | 1400 3500 | 3000 4000 | 3380 3000 | 2900
Description of the Topology
Topology Description NoSQL Count Configuration Manager Count Log Count SQL Count Cache Count TM Count
Xtra Small No of K8S worker Node -1 1 1 1 1 1 1
Small No of K8S worker Node -2 1 1 1 1 2 3
Small-2 No of K8S worker Node - 2

One Node dedicated for Traffic manager while remaining containers running on another node.

1 1 1 1 1 1
Medium-1 No of K8S worker Node - 3 3 (1 per node) 1 2 (max 1 per node) 1 3 (1 per node) 10 (max 4 per node)
Medium-2 Same as Medium-1. But each node has double the capacity, i.e. 8 core and 30 GB 3 (1 per node) 1 2 (max 1 per node) 1 3 (1 per node) 10 (max 4 per node)
Medium-3 Similar to medium-2 cluster of 3 nodes with 3 TMs (same as medium-1), but each node is 2 core and 8GB.

This test has been done to get no for licensing for total of 6 core.

3 (1 per node) 1 2 (max 1 per node) 1 3 (1 per node) 3 (max 4 per node)
Large-1 No of K8S worker Node - 5 3 (max 1 er node) 1 5 (max 1 per node) 1 3 (max 1 per node) 20 (max 4 per node)
Large-2 No of K8S worker Node - 6

3 nodes dedicated to all the 15 TMs. All remaining components running on remaining 3 nodes.

3 (max 1 per node) 1 2 (max 1 per node) 1 3 (max 1 per node) 15 (max 5 per node)

Test - Part 2

In the first part of the test, a limit of 20-21K TPS was reached for the extra large cluster. TPS was not increasing linearly as the cluster scaled horizontally. On further analysis, it was determined that for each request, each TM was connecting to each connected memcache server to get time reference for quota enforcement which created a bottleneck. Better throughput is available if the system time of TMs is used and by making sure that all the servers are in sync (using NTP or through other mechanism). The property to switch between these two mechanisms is available in TM but it is not exposed via TM property file. This is achieved by using the tml_tm_properties.json deployment property file. To use the cache servers as a shared time reference, set the use_system_time property value to false.

Create a k8s secret out of this property file and overwrite the existing template inside the TM container. With this tweak, a TPS of 110K-115K was achieved for a specific cluster. In this series of tests, extra large type of clusters were created with varying nodes from 8 to 30. In each of the cluster, TMs were run separately on dedicated nodes, log containers on dedicated nodes and other type of containers shared the remaining nodes. No two nosql or cache or log or TMs were running on the same node using K8S anti-affinity feature. CPU utilization was around 55-60% on TM nodes when reaching these max TPS. The details of the tests are shown in the following table.
Cluster TPS (Unprotected)
Extra Large -1

10 Nodes.

5 TMs - each on a single node dedicated for TM

2 Logs - each on single node dedicated for log

3 NoSQL - each on separate node

3 Cache - each on separate node but shared with NoSQL

1 SQL - On a node shared with NoSQL/cache

1 CM - On a node shared with NoSQL/cache

40000
Extra Large -2

15 Nodes.

8 TMs - each on a single node dedicated for TM

2 Logs - each on single node dedicated for log

3 NoSQL - each on separate node

3 Cache - each on separate node but shared with NoSQL

1 SQL - On a node shared with NoSQL/cache

1 CM - On a node shared with NoSQL/cache

55000
Extra Large -3

20 Nodes.

13 TMs - each on a single node dedicated for TM

2 Logs - each on single node dedicated for log

3 NoSQL - each on separate node

5 Cache - each on separate node but shared with NoSQL

1 SQL - On a node shared with NoSQL/cache

1 CM - On a node shared with NoSQL/cache

85,000
Extra Large -4

27 Nodes.

20 TMs - each on a single node dedicated for TM

2 Logs - each on single node dedicated for log

3 NoSQL - each on separate node

5 Cache - each on separate node but shared with NoSQL

1 SQL - On a node shared with NoSQL/cache

1 CM - On a node shared with NoSQL/cache

110,000

The deployment files were customized so that each TM and log was running on a single and separate node while NoSQL, SQL, and CM caches shared the separate nodes. However, while sharing nodes, each NoSQL and cache were running on separate nodes. The pod's anti affinity rules were combined with node label. Each cluster was divided into in three groups. One group of nodes were labeled with label (deploy = tm) where only TMs were deployed, the second group of nodes were labeled as (deploy = log) where only log was deployed while the third group of nodes were labeled as (deploy = other) for running remaining components. We used these labels in corresponding deployment files under the nodeSelector attribute. Contact Local Edition Support for the customized deployment file. This is a compressed system folder usually found in the deployment folder, which has updated/customized yaml files.