Fault Tolerance of Agents

Inference and query agents in an agent group (that is, all agent instances of the same agent class deployed in the same cluster) automatically behave in a fault tolerant manner.

Note: Cache agents do not need or use fault tolerance features. Fault tolerance of cache agents is handled transparently by the object management layer. For fault tolerance of cache data, the only configuration task is to define the number of backups you want to keep, and to provide sufficient storage capacity. Use of a backing store is recommended for better reliability (see Reliability of Cache Object Management ).

All load is distributed equally within all active agents in the same group. If any agents fail, the other agents automatically distribute the load between the remaining active agents in the group.

You can optionally start a certain number of agents in a group and keep the rest as standby agents. If an active agents fails, a standby agent is automatically activated. For most situations, however, there is no need to maintain standby agents.

Note: Fault Tolerance Limitation in Inference Agents: Entities that use Memory Only cache mode are not recoverable in failover or failback situations.

Behavior of Standby Agents

Query agents do not maintain stateful objects. When a standby agent becomes active, it simply begins to take on work.

Standby inference agents maintain a passive Rete network. They do not listen to events from channels, do not update working memory, and do not do read or write operations on the cache.

Note: Startup rule functions do not execute on failover: When a standby or inactive node becomes active, it does not execute startup rule functions.