Scaling for Large Data Sets and Workloads

TIBCO Patterns is designed to accommodate a wide variety of customer workloads with respect to data set size and query throughput while maintaining the highest matching accuracy. This section describes the scaling capabilities of TIBCO Patterns: multithreading, horizontal scaling and failover replication, and clustering.

Multithreading

TIBCO Patterns can make use of available computing resources by spawning worker threads to simultaneously process different queries to maximize throughput, thus providing scalability with very low overhead.

TIBCO Patterns algorithms use memory locality to optimize performance through judicious use of the processor cache. Almost always, the best performance in your production environment can be attained through the use of a dedicated machine running a single instance of TIBCO Patterns, with multithreading configured to use all available processor cores. For instance, if the dedicated machine has two quad-core processors with hyper threading, and thus a total of 16 hyper threading cores, you would start the TIBCO Patterns server with the maximum number of worker threads set to 16.

In production environments, to avoid contention for CPU resources and processor cache, it is suggested that the client application run on a separate machine from the TIBCO Patterns server.

Horizontal Scaling and Failover Replication

If a single dedicated server does not meet the requirements for the total throughput, an easy way to boost throughput is to distribute queries across several identical servers using industry-standard load balancers. Load balancing can also be used to provide an alternate server, or set of servers, in the event that the primary set of servers becomes unavailable. Since communications between TIBCO Patterns client applications and TIBCO Patterns servers are through standard TCP socket requests, standard load balancers or failover devices can fit seamlessly in front of a set of identical TIBCO Patterns servers.

Note that load balancing can only be used for read only operations such as queries and fetching records. For table updates, all instances of the TIBCO Patterns servers must be updated and therefore these operations cannot be performed through a load balancer, which chooses just a single instance to receive the request.

Clustering

Some very large tables might be too large to fit into memory on a single machine. Query latency times also tend to increase with table size, and might not meet requirements on a very large table when a single machine is used. To solve these problems, TIBCO Patterns allows tables to be split across a cluster of machines. A TIBCO Patterns server, called a gateway server, can be configured to act as a manager for this cluster. Applications perform all actions through the gateway server just like they perform them with a standard server, allowing applications to migrate from a single machine solution to a cluster solution with no change. From your client program's point of view, the tables appear to exist on the gateway as on a single instance. The splitting of tables and distribution of commands across the machines of the cluster is transparent to the application.

The strategies of horizontal scaling and clustering can be used in combination. The same way a load balancer distributes queries across multiple servers, it can distribute queries across multiple gateways, each managing its own independent cluster.

Clustering can decrease query latency time and increase total throughput to some degree. Horizontal scaling does not affect query latency, but increases total throughput to any preferred degree. Combining these two allows handling very large tables with stringent throughput and latency requirements.

You can find the details on configuring a gateway and managing clusters in the TIBCO Patterns Installation Guide.

For the details on working with clusters, see the "Java API Reference" and ".NET API Reference" topics in TIBCO Patterns Online References.