Chapter 19 Fault Tolerance : Shared State Failover Process

Shared State Failover Process
This section presents details of the shared state failover sequence.
Detection
A backup server detects a failure of the primary in either of two ways:
Heartbeat Failure—The primary server sends heartbeat messages to the backup server to indicate that it is still operating. When a network failure stops the servers from communicating with each other, the backup server detects the interruption in the steady stream of heartbeats. For details, see Heartbeat Parameters.
Connection Failure—The backup server can detect the failure of its TCP connection with the primary server. When the primary process terminates unexpectedly, the backup server detects the broken connection.
Response
When a backup server (B) detects the failure of the primary server (A), then B attempts to assume the role of primary server. First, B obtains the lock on the current shared state. When B can access this information, it becomes the new primary server.
Figure 27 Failed Primary Server
Lock Unavailable
If B cannot obtain the lock immediately, it alternates between attempting to obtain the lock (and become the primary server), and attempting to reconnect to A (and resume as a backup server)—until one of these attempts succeeds.
Role Reversal
When B becomes the new primary server, A can restart as a backup server, so that the two servers exchange roles.
Figure 28 Recovered Server Becomes Backup
Client Transfer
Clients of A that are configured to failover to backup server B automatically transfer to B when it becomes the new primary server. B reads the client’s current state from the shared storage to deliver any persistent messages to the client.
Client Notification
Client applications can receive notification when shared state failover occurs.
Java
To receive notification, Java client programs set the system property tibco.tibjms.ft.switch.exception to any value, and define an ExceptionListener to handle failover notification; see the class com.tibco.tibjms.Tibjms in TIBCO Enterprise Message Service Java API Reference.
C
To receive notification, C client programs call tibems_setExceptionOnFTSwitch(TIBEMS_TRUE) and register the exception callback in order to receive the notification that the reconnection was successful.
C#
To receive notification, .NET client programs call Tibems.SetExceptionOnFTSwitch(true), and define an exception listener to handle failover notification; see the method Tibems.SetExceptionOnFTSwitch in TIBCO Enterprise Message Service .NET API Reference.
Message Redelivery
Persistent
When a failure occurs, messages with delivery mode PERSISTENT, that were not successfully acknowledged before the failure, are redelivered.
Synchronous Mode
When using durable subscribers, EMS guarantees that a message with PERSISTENT delivery mode and written to a store with the property mode=sync, will not be lost during a failure.
Delivery Succeeded
Any messages that have been successfully acknowledged or committed are not redelivered, in compliance with the JMS 1.1 specification.
Topics
All topic subscribers continue normal operation after a failover.
Transactions
A transaction is considered active when at least one message has been sent or received by the session, and the transaction has not been successfully committed.
After a failover, attempting to commit the active transaction results in a javax.jms.TransactionRolledBackException. Clients that use transactions must handle this exception, and resend any messages sent during the transaction. The backup server automatically redelivers any messages that were delivered to the session during the transaction that rolled back.
Queues
For queue receivers, any messages that have been sent to receivers, but have not been acknowledged before the failover, may be sent to other receivers immediately after the failover.
A receiver trying to acknowledge a message after a failover may receive the javax.jms.IllegalStateException. This exception signifies that the attempted acknowledgement is for a message that has already been sent to another queue receiver. This exception only occurs in this scenario, or when the session or connection have been closed. This exception cannot occur if there is only one receiver at the time of a failover, but it may occur for exclusive queues if more than one receiver was started for that queue.
When a queue receiver catches a javax.jms.IllegalStateException, the best course of action is to call the Session.recover() method. Your application program should also be prepared to handle redelivery of messages in this situation. All queue messages that can be redelivered to another queue receiver after a failover always have the header field JMSRedelivered set to true; application programs must check this header to avoid duplicate processing of the same message in the case of redelivery.
Acknowledged messages are never redelivered (in compliance with the JMS specification). The case described above occurs when the application cannot acknowledge a message because of a failover.
Heartbeat Parameters
When the primary server heartbeat stops, the backup server waits for its activation interval (elapsed time since it detected the most recent heartbeat); then the backup server retrieves information from shared storage and assumes the role of primary server.
The default heartbeat interval is 3 seconds, and the default activation interval is 10 seconds. The activation interval must be at least twice the heartbeat interval. Both intervals are specified in seconds. You can set these intervals in the server configuration files. See Fault Tolerance Parameters for details.