Object Management Options : Object Management and Fault Tolerance Scenarios

Object Management and Fault Tolerance Scenarios
The tables in this section help you understand how fault tolerance and object management options work in various deployment scenarios to maintain data integrity. The tables explain what is possible in each type of object management given the following conditions:
Processing Units (PUs)  One or multiple PUs, where a PU is a BusinessEvents server running in one JVM.
Agents  One or multiple inference agents running in a PU. Each inference agent is configured by an agent class in the CDD. An inference agent has one or more Rete networks. See Designing for Concurrency for related details.
When implementing a recovery strategy you must take care to maintain the integrity of stateful objects. Concepts and scorecards are stateful objects and must maintain state across inference agents. Not all options provide that option.
Cache OM with Memory Only Mode on All Objects and Fault Tolerance Scenarios
In Memory object management does not support fault tolerance. This table presents options available if you use Cache OM with Memory Only mode set on all objects, which provides fault tolerance for memory only objects.
 
n Agents
n PUs
Data is isolated in each PU. Failover and failback are allowed. Object state is not preserved or transferred. Recommended only for stateless operations.
n PUs
n Agents
Object state is not maintained during failover and failback. Recommended only for stateless operations.
Berkeley DB Object Management and Fault Tolerance Scenarios
Fault Tolerance with Persistence-Based Object Management  As explained in the table below, the BusinessEvents built-in fault tolerance feature is not supported for use with persistence-based object management. You can implement a custom solution, however.
 
Data is isolated in a single persistence database. On recovery, object state is recovered to the last checkpoint.
n Agents
In all deployment scenarios, each agent’s data is isolated in a separate persistence database. On recovery, object state is recovered to the last checkpoint of the appropriate database.
n PUs
Not supported with BusinessEvents built-in fault tolerance. Automatic failover and failback is not possible due to presence of lock files. Use a custom solution.
n PUs
n Agents
Not supported with BusinessEvents built-in fault tolerance. Automatic failover and failback is not possible due to presence of lock files. Multiple write operations by agents on the primary PU could lead to data inconsistency. Use a custom solution.
Cache Object Management and Fault Tolerance Scenarios
In all cases it is assumed that dedicated cache agents are also running. Fault tolerance of the engine process refers to inference agents only. See Distributed Cache and Multi-Agent Architecture and Terms.
If you use multi-engine (multi-agent) features, fault tolerance is implicit. When all agents in an agent group are active, if any active agent fails, remaining agents in the group automatically handle the work load.
In all cases, in the event of total system failure, use of a backing store ensures recovery of data written to the backing store.
n Agents
(N/A) Each agent in the same PU is a different agent, not part of the same agent group.
n PUs
Multi-agent mode: If one or more agents in a group fails, the load is distributed among remaining agents in that group. All agents can be active or some can be standbys. Configuration uses a MaxActive property and a Priority property.
Single-engine mode (Deprecated feature): Priority setting determines which agent in an agent group is active, as well as the failover and failback order.
Cluster data is shared between agents in all groups across all PUs, using the cache cluster.
If the number of cache object backups is one, one cache agent (at a time) can fail with no data loss. If the number of backups is two, two servers can fail, and so on.
Because caches exist in memory only, recovery is not available in the case of a total system failure. All data in each JVM memory is lost in a total system failure.
In the event of total system failure, use of a backing store ensures recovery of data written to the backing store.
n PUs
n Agents
Same as n PUs 1 agent. Each of the agents in one PU is fault tolerant with the agents in the same agent group, which are deployed in other PUs.