Chapter 19 Fault Tolerance : Fault Tolerance Overview

Fault Tolerance Overview
You can arrange TIBCO Enterprise Message Service servers for fault-tolerant operation by configuring a pair of servers—one primary and one backup. The primary server accepts client connections, and interacts with clients to deliver messages. If the primary server fails, the backup server resumes operation in its place. (We do not support more than two servers in a fault-tolerant configuration.)
Shared State
A pair of fault-tolerant servers can have access to shared state, which consists of information about client connections and persistent messages. This information enables the backup server to properly assume responsibility for those connections and messages. Figure 25 illustrates a fault-tolerant configuration of EMS.
Figure 25 Primary and Backup Servers with Shared State
Locking
To prevent the backup server from assuming the role of the primary server, the primary server locks the shared state during normal operation. If the primary server fails, the lock is released, and the backup server can obtain the lock.
Unshared State Failover
You can also include backup servers that do not share state. As with shared state, a second server assumes responsibility for connections and messages after the failure of the current server. However, unlike shared state, unshared state is controlled by the EMS client, and unshared state failover is not as fault-tolerant as shared state failover. Because the state is not shared among servers, messages can be lost, duplicated, or delivered out-of-order across the failover process.
Figure 26 illustrates an unshared state fault-tolerant configuration of EMS.
Figure 26 Current and Second Servers with Unshared State
Configuration Files
When a primary server fails, its backup server assumes the status of the primary server and resumes operation. Before becoming the new primary server, the backup server re-reads all of its configuration files. If the two servers share configuration files, then administrative changes to the old primary carry over to the new primary.
When fault-tolerant servers share configuration files, you must limit configuration changes to the current primary server only. Separately reconfiguring the backup server can cause it to overwrite the shared configuration files; unintended misconfiguration can result.