Chapter 14 Understanding Object Management and Fault Tolerance : Object Management and Fault Tolerance Scenarios

Object Management and Fault Tolerance Scenarios
The tables in this section help you understand how fault tolerance and object management options work in various deployment scenarios to maintain data integrity. The tables explain what is possible in each type of object management given the following conditions:
Nodes  One or multiple nodes, where a node is a JVM containing one BusinessEvents server.
Agents  One or multiple inference agents. Each inference agent is configured by one BAR resource in a project. An inference agent has a Rete network. See Designing With Multiple Active Inference Agents for related details.
When implementing a recovery strategy for a rule engine product such as BusinessEvents, you must take care to maintain the integrity of stateful objects. Concepts and scorecards are stateful objects and must maintain state across inference agents. Not all options provide that option.
In Memory and Persistence OM—Behavior in Multiple-Agent Nodes
When multiple agents in a node use In Memory or Persistence object management options, concept instances and scorecards are not shared between them. For behavior of multiple agents in a node with In Memory OM see Local Channels.
For behavior of multiple concurrent agents in Cache OM deployments see Designing With Multiple Active Inference Agents
In Memory Object Management and Fault Tolerance Scenarios
 
n Agents
n Nodes
Data is isolated in each node JVM. Failover and failback are allowed. Object state is not preserved or transferred. Recommended only for stateless operations.
n Nodes
n Agents
Object state is not maintained during failover and failback. Recommended only for stateless operations.
Persistence Object Management and Fault Tolerance Scenarios
Fault Tolerance with Persistence-Based Object Management  As explained in the table below, the BusinessEvents built-in fault tolerance feature is not supported for use with persistence-based object management. You can implement a custom solution, however.
 
Data is isolated in a single persistence database. On recovery, object state is recovered to the last checkpoint.
n Agents
In all deployment scenarios, each agent’s data is isolated in a separate persistence database. On recovery, object state is recovered to the last checkpoint of the appropriate database.
n Nodes
Not supported with BusinessEvents built-in fault tolerance. Automatic failover and failback is not possible due to presence of lock files. Use a custom solution.
n Nodes
n Agents
Not supported with BusinessEvents built-in fault tolerance. Automatic failover and failback is not possible due to presence of lock files. Multiple write operations by agents on the primary node could lead to data inconsistency. Use a custom solution.
Cache Object Management and Fault Tolerance Scenarios
In all cases it is assumed that dedicated cache servers are also running. Fault tolerance of the engine process refers to inference agents only. See Distributed Cache and Multi-Engine Architecture and Terms.
If you use multi-engine features, fault tolerance is implicit. When all agents in an agent group are active, if any active agent fails, remaining agents in the group automatically handle the work load.
In all cases, in the event of total system failure, use of a backing store ensures recovery of data written to the backing store.
n Agents
(N/A) Each agent in the same node is a different agent, not part of the same agent group.
n Nodes
Multi-engine mode: If one or more agents in a group fails, the load is distributed among remaining agents in that group. All agents can be active or some can be inactive. Configuration uses a MaxActive property and a Priority property.
Single-engine mode: Priority setting determines which agent in an agent group is active, as well as the failover and failback order.
Cluster data is shared between agents in all groups across all nodes, using the cache cluster.
If the number of cache object backups is one, one cache server (at a time) can fail with no data loss. If the number of backups is two, two servers can fail, and so on.
Because caches exist in memory only, recovery is not available in the case of a total system failure. All data in each JVM memory is lost in a total system failure.
In the event of total system failure, use of a backing store ensures recovery of data written to the backing store.
Multi-engine mode: N/A. Fault tolerance is implicit.
n Nodes
n Agents
Same as n Nodes 1 agent. Each of the agents in one node is fault tolerant with the agents in the same agent group, which are deployed in other nodes.
Multi-engine mode: N/A. Fault tolerance is implicit.