Copyright © TIBCO Software Inc. All Rights Reserved
Copyright © TIBCO Software Inc. All Rights Reserved


Chapter 19 Fault Tolerance : Shared State Failover Process

Shared State Failover Process
This section presents details of the shared state failover sequence.
Detection
A standby server detects a failure of the active server in either of two ways:
Heartbeat Failure—The active server sends heartbeat messages to the standby server to indicate that it is still operating. When a network failure stops the servers from communicating with each other, the standby server detects the interruption in the steady stream of heartbeats. For details, see Heartbeat Parameters.
Connection Failure—The standby server can detect the failure of its TCP connection with the active server. When the active server process terminates unexpectedly, the standby server detects the broken connection.
Response
When a standby server (B) detects the failure of the active server (A), then B attempts to assume the role of active server. First, B obtains the lock on the current shared state. When B can access this information, it becomes the new active server.
Figure 22 Failed Active Server
Lock Unavailable
If B cannot obtain the lock immediately, it alternates between attempting to obtain the lock (and become the active server), and attempting to reconnect to A (and resume as a standby server)—until one of these attempts succeeds.
Role Reversal
When B becomes the new active server, A can restart as a standby server, so that the two servers exchange roles.
Figure 23 Recovered Server Becomes Standby
Client Transfer
Clients of A that are configured to failover to standby server B automatically transfer to B when it becomes the new active server. B reads the client’s current state from the shared storage to deliver any persistent messages to the client.
Client Notification
Client applications can receive notification when shared state failover occurs.
Java
To receive notification, Java client programs set the system property tibco.tibjms.ft.switch.exception to any value, and define an ExceptionListener to handle failover notification; see the class com.tibco.tibjms.Tibjms in TIBCO Enterprise Message Service Java API Reference.
C
To receive notification, C client programs call tibems_setExceptionOnFTSwitch(TIBEMS_TRUE) and register the exception callback in order to receive the notification that the reconnection was successful.
C#
To receive notification, .NET client programs call Tibems.SetExceptionOnFTSwitch(true), and define an exception listener to handle failover notification; see the method Tibems.SetExceptionOnFTSwitch in TIBCO Enterprise Message Service .NET API Reference.
Message Redelivery
Persistent
When a failure occurs, messages with delivery mode PERSISTENT, that were not successfully acknowledged before the failure, are redelivered.
Synchronous Mode
When using durable subscribers, EMS guarantees that a message with PERSISTENT delivery mode and written to a store with the property mode=sync, will not be lost during a failure.
Delivery Succeeded
Any messages that have been successfully acknowledged or committed are not redelivered, in compliance with the JMS specification.
Topics
All topic subscribers continue normal operation after a failover.
Transactions
A (non-XA) transaction is considered active when at least one message has been sent or received by the session, and the transaction has not been successfully committed. An XA transaction is considered active when the XA start method is called.
After a failover, attempting to commit the active transaction results in a javax.jms.TransactionRolledBackException. Clients that use transactions must handle this exception, and resend any messages sent during the transaction. The standby server, upon becoming active, automatically redelivers any messages that were delivered to the session during the transaction that rolled back.
Queues
For queue receivers, any messages that have been sent to receivers, but have not been acknowledged before the failover, may be sent to other receivers immediately after the failover.
A receiver trying to acknowledge a message after a failover may receive the javax.jms.IllegalStateException. This exception signifies that the attempted acknowledgement is for a message that has already been sent to another queue receiver. This exception only occurs in this scenario, or when the session or connection have been closed. This exception cannot occur if there is only one receiver at the time of a failover, but it may occur for exclusive queues if more than one receiver was started for that queue.
When a queue receiver catches a javax.jms.IllegalStateException, the best course of action is to call the Session.recover() method. Your application program should also be prepared to handle redelivery of messages in this situation. All queue messages that can be redelivered to another queue receiver after a failover always have the header field JMSRedelivered set to true; application programs must check this header to avoid duplicate processing of the same message in the case of redelivery.
Heartbeat Parameters
When the active server heartbeat stops, the standby server waits for its activation interval (elapsed time since it detected the most recent heartbeat); then the standby server retrieves information from shared storage and assumes the role of active server.
The default heartbeat interval is 3 seconds, and the default activation interval is 10 seconds. The activation interval must be at least twice the heartbeat interval. Both intervals are specified in seconds. You can set these intervals in the server configuration files. See Fault Tolerance Parameters for details.
Configuration Files
When an active server fails, its standby server assumes the status of the active server and resumes operation. Before becoming the active server, the standby server re-reads its configuration files. If the two servers share configuration files, then the administrative changes to an active server carry over to its standby once the latter becomes active.
When fault-tolerant servers share configuration files, you must limit configuration changes to the active server only. Separately reconfiguring the standby server can cause it to overwrite the shared configuration files; unintended misconfiguration can result.

Copyright © TIBCO Software Inc. All Rights Reserved
Copyright © TIBCO Software Inc. All Rights Reserved