Cluster Sizing Summary
This section represents cluster sizing recommendations based on various tests performed on TIBCO Cloud™ API Management - Local Edition 5.x. Some tests used default deployments scripts shipped with API Management - Local Edition, while other tests used customized deployment files (not part of default scripts). Customizations included running only Traffic Managers on dedicated nodes in a cluster, while other components were on remaining nodes in the same cluster. The following tables can be referenced as a guideline for creating K8S clusters as per your high performance requirements, expressed as Transactions per Second (TPS), etc. These tests were performed on Local Edition 5.x with variations in response size and latency of the backend.
Test - Part 1
- All the nodes in each cluster were 4 core CPU and 15 GB memory.
- Load Generation: 8 Jmeter hosts in US West1b (GCP)
- Backend: 8 Latency Injector hosts in US West1b
- Local Edition cluster Region: US Central1 (GCP).
- These tests were performed when TMs used getStats method on memcache servers for time synchronization(default behavior, as of Local Edition 5.3.1).
Response Size (Latency) →
Cluster Type ↓ |
2b (0 ms) | 1kb (100 ms) | 256kb (500 ms) | 1-64kb (100- 300 ms) | 1-8kb (30-180 ms) | 4-128kb (100-300 ms) |
---|---|---|---|---|---|---|
Unprotected | Protected (OAuth) | ||||||
Xtra Small | 734 | 781 | 723 | 571 | 265 | 229 | 688 | 610 | 761 | 545 | 647 | 385 |
Small-1 | 1920 | 1800 | 1600 | 1460 | 692 | 590 | 1470 | 1200 | 1850 | 1320 | 1420 | 1300 |
Small-2 | 1300 | 1230 | 1270 | 1100 | 366 | 283 | 1000 | 913 | 1330 | 733 | 1000 | 723 |
Medium-1 | 2300 | 1600 | 2100 | 1900 | 909 | 683 | 1870 | 1470 | 2100 | 2000 | 1950 | 1850 |
Medium-2 | 4400 | 3300 | 3500 | 3300 | 1300 | 12 | 3800 | 3400 | 4200 | 2500 | 3500 | 3000 |
Large-1 | 2300 | 2200 | 1800 | 1470 | 1420 | 1100 | 2500 | 2300 | 2300 | 1700 | 1750 | 1700 |
Large-2 | 4000 | 3700 | 3850 | 3300 | 1500 | 1400 | 3500 | 3000 | 4000 | 3380 | 3000 | 2900 |
Topology | Description | NoSQL Count | Configuration Manager Count | Log Count | SQL Count | Cache Count | TM Count |
---|---|---|---|---|---|---|---|
Xtra Small | No of K8S worker Node -1 | 1 | 1 | 1 | 1 | 1 | 1 |
Small | No of K8S worker Node -2 | 1 | 1 | 1 | 1 | 2 | 3 |
Small-2 | No of K8S worker Node - 2
One Node dedicated for Traffic manager while remaining containers running on another node. |
1 | 1 | 1 | 1 | 1 | 1 |
Medium-1 | No of K8S worker Node - 3 | 3 (1 per node) | 1 | 2 (max 1 per node) | 1 | 3 (1 per node) | 10 (max 4 per node) |
Medium-2 | Same as Medium-1. But each node has double the capacity, i.e. 8 core and 30 GB | 3 (1 per node) | 1 | 2 (max 1 per node) | 1 | 3 (1 per node) | 10 (max 4 per node) |
Medium-3 | Similar to medium-2 cluster of 3 nodes with 3 TMs (same as medium-1), but each node is 2 core and 8GB.
This test has been done to get no for licensing for total of 6 core. |
3 (1 per node) | 1 | 2 (max 1 per node) | 1 | 3 (1 per node) | 3 (max 4 per node) |
Large-1 | No of K8S worker Node - 5 | 3 (max 1 er node) | 1 | 5 (max 1 per node) | 1 | 3 (max 1 per node) | 20 (max 4 per node) |
Large-2 | No of K8S worker Node - 6
3 nodes dedicated to all the 15 TMs. All remaining components running on remaining 3 nodes. |
3 (max 1 per node) | 1 | 2 (max 1 per node) | 1 | 3 (max 1 per node) | 15 (max 5 per node) |
Test - Part 2
In the first part of the test, a limit of 20-21K TPS was reached for the extra large cluster. TPS was not increasing linearly as the cluster scaled horizontally. On further analysis, it was determined that for each request, each TM was connecting to each connected memcache server to get time reference for quota enforcement which created a bottleneck. Better throughput is available if the system time of TMs is used and by making sure that all the servers are in sync (using NTP or through other mechanism). The property to switch between these two mechanisms is available in TM but it is not exposed via TM property file. This is achieved by using the tml_tm_properties.json deployment property file. To use the cache servers as a shared time reference, set the use_system_time property value to false.
Cluster | TPS (Unprotected) |
---|---|
Extra Large -1
10 Nodes. 5 TMs - each on a single node dedicated for TM 2 Logs - each on single node dedicated for log 3 NoSQL - each on separate node 3 Cache - each on separate node but shared with NoSQL 1 SQL - On a node shared with NoSQL/cache 1 CM - On a node shared with NoSQL/cache |
40000 |
Extra Large -2
15 Nodes. 8 TMs - each on a single node dedicated for TM 2 Logs - each on single node dedicated for log 3 NoSQL - each on separate node 3 Cache - each on separate node but shared with NoSQL 1 SQL - On a node shared with NoSQL/cache 1 CM - On a node shared with NoSQL/cache |
55000 |
Extra Large -3
20 Nodes. 13 TMs - each on a single node dedicated for TM 2 Logs - each on single node dedicated for log 3 NoSQL - each on separate node 5 Cache - each on separate node but shared with NoSQL 1 SQL - On a node shared with NoSQL/cache 1 CM - On a node shared with NoSQL/cache |
85,000 |
Extra Large -4
27 Nodes. 20 TMs - each on a single node dedicated for TM 2 Logs - each on single node dedicated for log 3 NoSQL - each on separate node 5 Cache - each on separate node but shared with NoSQL 1 SQL - On a node shared with NoSQL/cache 1 CM - On a node shared with NoSQL/cache |
110,000 |
The deployment files were customized so that each TM and log was running on a single and separate node while NoSQL, SQL, and CM caches shared the separate nodes. However, while sharing nodes, each NoSQL and cache were running on separate nodes. The pod's anti affinity rules were combined with node label. Each cluster was divided into in three groups. One group of nodes were labeled with label (deploy = tm) where only TMs were deployed, the second group of nodes were labeled as (deploy = log) where only log was deployed while the third group of nodes were labeled as (deploy = other) for running remaining components. We used these labels in corresponding deployment files under the nodeSelector attribute. Contact Local Edition Support for the customized deployment file. This is a compressed system folder usually found in the deployment folder, which has updated/customized yaml files.