Configuring Fault Tolerant Process Engines

Copyright © Cloud Software Group, Inc. All Rights Reserved

Chapter 5 Setting Deployment Options : Configuring Fault Tolerant Process Engines

Configuring Fault Tolerant Process Engines

The TIBCO ActiveMatrix BusinessWorks process engine can be configured to be fault-tolerant. You can start up several engines. In the event of a failure, other engines restart process starters and the corresponding services.

If you use a database to store process engine information, a process instance is re-instantiated to the state of its last checkpoint. In the event of a failure, any processing done after a checkpoint is lost when the process instance is restarted by another engine. See TIBCO ActiveMatrix BusinessWorks Palette Reference for more information about Checkpoint activities. See Changing the Checkpoint Data Repository for a Process for more information about configuring process engine storage.

Figure 2 illustrates normal operation of a fault-tolerant configuration. One engine is configured as the master, and it creates and executes services. The second engine is a secondary engine, and it stands by in case of failure of the master. The engines send heartbeats to notify each other they are operating normally.

Figure 4 Normal operation: master processing while secondary stands by

In the event the master process engine fails, the secondary engine detects the stop in the master’s heartbeat and resumes operation in place of the master. All process starters are restarted on the secondary, and services are restarted to the state of their last checkpoint. Figure 3 illustrates a failure and the secondary restarting the service.

Figure 5 Fault-tolerant failover

The expected deployment is for master and secondary engines to reside on separate machines. You can have multiple secondary engines, if desired, and you can specify a weight for each engine. The weight determines the type of relationship between the fault-tolerant engines. See Peer or Master and Secondary Relationships for more information about relationships between fault-tolerant engines.

A master and its secondary engines is known as a fault-tolerant group. The group can be configured with several advanced configuration options, such as the heartbeat interval and the weight of each group member. See TIBCO ActiveMatrix BusinessWorks Palette Reference for a complete description of configuration options for fault tolerance.

Peer or Master and Secondary Relationships

Members of a fault-tolerant group can be configured as peers or as master and secondary engines. If all engines are peers, when the machine containing the currently active process engine fails, another peer process engine resumes processing for the first engine, and continues processing until its machine fails.

If the engines are configured as master and secondary, the secondary engine resumes processing when the master fails. The secondary engine continues processing until the master recovers. Once the master recovers, the secondary engine stands by and the master takes over processing again.

The Fault Tolerance tab of the Process Engine deployment resource allows you to specify the member weight of each member of a fault-tolerant group. The member with the highest weight is the master. You can select "Peer" in the first field on the tab to configure all engines as peers (that is, they all have the same weight). You can select Primary/Secondary to configure the engines as master and secondary. You can also select Custom to specify your own values for the weight of each member of the group.

Process Starters and Fault-Tolerance

When a master process engine fails, its process starters are restarted on the secondary engine. This may not be possible with all process starters. For example, the HTTP Receiver process starter listens for HTTP requests on a specified port on the machine where the process engine resides. If a secondary engine resumes operation for a master engine, the new machine is now listening for HTTP requests on the specified port. HTTP requests always specify the machine name, so incoming HTTP requests will not automatically be redirected to the new machine.

Each process starter has different configuration requirements, and not all process starters may gracefully resume on a different machine. You may have to provide additional hardware or software to redirect the incoming events to the appropriate place in the event of a failure.

Also, your servers may not have all of the necessary software for restarting all of instances. For example, your database may reside on the same machine as your master process engine. If that server goes down, any JDBC activities will not be able to execute. Therefore, you may not wish to load process definitions that use JDBC activities in your secondary process engine.

You can specify that your secondary process engine loads different process definitions than the master. You may only want to load the process definitions that can gracefully migrate to a new server during a failure.

Setting Fault Tolerant Options

The FT Group Settings panel displays only if the TIBCO ActiveMatrix BusinessWorks process you have selected has been added to at least two (different) machines. If your domain includes components that were deployed as part of a fault-tolerant group, the display includes the information about the group.

You can start one or more process engines in the group. If more than one engine has started, only one is displayed as Running and all other engines are displayed as Standing By (or, initially, as Starting Up).

When you change the status of a component that has been deployed as part of a FT group, the status change affects all other members of the group.

•

After you have deployed the process engines, it is most efficient to select all process engines by clicking the check boxes, and then choosing Start. After the primary and secondary engines have communicated, the master will display as Running and all other engines as Standby. If you start only the primary, it will first go to Standby mode as it checks the status of the other engines. It then changes to Running.

•

If you shutdown a process engine, the appropriate secondary engine starts automatically.

1.

In TIBCO Administrator, click Application Management.

2.

Select an application and expand it.

3.

In the Configuration Builder pane, click process name. A process is named with a .par suffix.

4.

Click the General tab.

5.

Select Run Fault Tolerant. Change other options as required. See FT Group Settings for field descriptions.

6.

Click Save.

Changing the Checkpoint Data Repository for a Process

A checkpoint saves the current state of a running process instance. For a secondary process engine to resume running process instances from their last checkpoint, the secondary process engine must have access to the saved state of the process instances from the master process engine.

Features that allow communication across process engines (for example, wait/notify, critical sections, shared variables, and so on) require a database for storage of process engine state.

If you are running process engines that do not communicate with each other, then the file system can be used for process engine storage. In this case, if you configure primary and secondary engines for fault tolerance, all engines must point to the same shared location within the file system.

The remainder of this section describes using a database for process engine storage.

Because fault-tolerant engines are expected to be on separate machines, you should specify to use a database for storage for each process engine. This allows you to specify the same JDBC Connection resource for the master and secondary engines, and therefore all engines can share the information stored for process instance checkpoints.

If all engines share the checkpoint information, and then the secondary engines can recover process instances up to their last checkpoint. If engines do not share the checkpoint information, process instances are not restarted.

To change checkpoint data repository properties, perform the following procedure:

1.

In TIBCO Administrator, click Application Management.

2.

Select an application and expand it.

3.

In the Configuration Builder pane, click a process name. A process is named with a .par suffix.

4.

Click the Advanced tab.

5.

Change properties as required. The value defaults to Checkpoint Data Repository. If a JDBC Connection Resource has been configured for the project, you also have the option to choose database.

6.

Click Save.