Fault Tolerant Messaging Using EMS

You can set up the fault tolerant of the TIBCO MDM messaging system for a single server or cluster using the TIBCO EMS messaging software. The EMS fault tolerant setup consists of a primary EMS server and a standby or backup server.

The two servers share the data store (typically on the file system) containing client information and the messages information. Initially, the primary server is active and the backup server monitors the primary server. When the primary server (or the host machine of that server) fails, the backup server will detect this and will become active. The messaging client (TIBCO MDM) also detects that failure and will transparently reconnect to the now active backup server.

Both the TIBCO MDM cluster and the EMS messaging software have to be configured for this deployment scenario.

EMS Server Setup

Refer to the "Configuring Fault-Tolerant Servers" section from the TIBCO EMS User’s Guide for details.

The configuration of the two EMS servers as a fault tolerant cluster involves configuring both message server configuration files (for example: EMS-Configuration/tibco/cfgmgmt/ems/data/tibemsd.conf). Both server names (property called server) have to be equal since they represent the same server. The configuration entry for fault tolerance involve the properties starting with ft_*. The most important one is ft_active, which will point to network address of the other message server.

The other values (ft_heartbeat, ft_activation, ft_reconnect_timeout) can be left at default values. During setup, the primary server should be started first and then the backup server. The backup server should print a message similar to Server is in standby mode for tcp://myhost:7222.

TIBCO MDM Setup

The EMS Cluster has to be registered with TIBCO MDM. This is achieved by having multiple entries separated by a comma in the Cluster Server List property for both the Bus (Topic) and Queue setup.

In addition to the primary EMS server (LocalhostServer), you need to add the second server (Server2) to the Cluster Server List configuration value, which serve as a backup server. You must define the Server2 Server Connection String and Server2 Server encoding properties using Add New Property. Both have to be string values.

Also, check the Failed Connection Refresh Flag and Failed Connection Replace Optimization flags properties set as true.

Each TIBCO MDM server will retry several times to reconnect to the backup server in case of a failure. Choose 6 connection retry attempts (Failed Connection Retry Count) and 10000 ms (or 10 seconds) time delay between attempts. These values work well with the default EMS cluster setup. The delay should not be less than 10s. Choose the same value for ft_activation in the EMS cluster setup. Also the total time the TIBCO MDM Server attempts to reconnect to the backup server (6 * 10seconds = 60 seconds) will not be useful if it exceeds the ft_reconnect_timeout (by default 60 seconds).

Copy the EMS-Configuration\tibco\cfgmgmt\ems\data folder and rename the file, for example, rename it to data_secondary, and use it for the secondary server.

The same procedure has to be repeated for the TIBCO EMS Queue setup at InitialConfig > Queue Setup > Cluster > TIBCO EMS.

Queue and Topic Setup

For Queue and Topic Setup, see "Configuring Queues and Topics" section of TIBCO MDM System Administration.