Copyright © TIBCO Software Inc. All Rights Reserved


Chapter 7 Distributed Cache OM : Load Balancing and Fault Tolerance of Inference Agents

Load Balancing and Fault Tolerance of Inference Agents
This section discusses the behavior of inference agents.
Cache agents and query agents do not need or use fault tolerance features. Query agents do not maintain stateful objects and don’t require fault tolerance. Fault tolerance of cache agents is handled transparently by the object management layer. For fault tolerance of cache data, the only configuration task is to define the number of backups you want to keep, and to provide sufficient storage capacity. Use of a backing store is recommended for better reliability (see Reliability of Cache Object Management).
Load Balancing of Inference Agents in a Group
Load balancing enables horizontal and vertical scaling. The underlying cluster behaves like a database for all the agents connected to the cluster. Load balancing makes use of point-to-point messaging, such as JMS queues. With point-to-point communication, messages are automatically distributed among the members of an agent group. (You can also use different agents to listen to different queues.)
Every JMS input destination runs in its own JMS Session. This provides good throughput for processing, and less connections (see Each JMS Input Destination Runs a Session).
Certain aspects of the design have to be managed by the application. See Designing for Concurrency for related information.
Fault Tolerance Between Inference Agents in a Group
All inference agents in an agent group (that is, all agent instances of the same agent class deployed in the same cluster) automatically behave in a fault tolerant manner. All load is distributed equally within all active agents in the same group. If any agents fail, the other agents automatically distribute the load between the remaining active agents in the group.
You can optionally start a certain number of agents in a group and keep the rest as standby agents. If an active agents fails, a standby agent is automatically activated. For most situations, there is no need to maintain standby agents.
Behavior of Standby Agents
Standby agents maintain a passive Rete network. They do not listen to events from channels, do not update working memory, and do not do read or write operations on the cache.
Startup rule functions do not execute on failover  When a standby or inactive node becomes active, it does not execute startup rule functions.

Copyright © TIBCO Software Inc. All Rights Reserved