Multiple Groups

In some situations a process joins more than one fault tolerance group. Each group protects a specific role that the process plays within a larger distributed application system.

Example: Mutual Backup across a WAN

Mutual Backup across a WAN illustrates a situation with two levels of fault tolerance. Network sites in Tokyo and Seattle are connected by a WAN link, and Rendezvous routing daemons forward messages on demand between the two sites.

At each site, a pair of computation servers listens for client requests, processes each request, and sends the results to the client.

Local Fault Tolerance Coverage

The volume of requests is low, and one process can accommodate them. However, the query service is critical to the enterprise, so each site runs two process instances, which cooperate for fault tolerance. In Tokyo the processes are A and B; in Seattle, J and K.

The active Tokyo process listens for requests that carry the subject name TOKYO.REQUEST. The active Seattle process listens for requests that carry the subject name SEATTLE.REQUEST.

To administer fault tolerance at the Tokyo site, processes A and B join a fault tolerant group named TOKYO.APP1. A has higher weight than B, so A is initially active. The group’s active goal is one, so only one member of the group actively processes requests. Similarly, processes J and K in Seattle join a group named SEATTLE.APP1. J has higher weight than K, so J is initially active.

If the active member at either site fails, the inactive member at the same site takes its place.

Figure 32: Mutual Backup across a WAN

Long-Distance Fault Tolerance Coverage

Although unlikely, it is distinctly possible that both request servers at a site might fail simultaneously. If the WAN link is still operative, the Seattle site can serve as a backup for the Tokyo site, and vice versa.

For long-distance fault tolerance coverage, Seattle processes J and K join the Tokyo fault tolerant group, TOKYO.APP1; and Tokyo processes A and B join the Seattle fault tolerant group, SEATTLE.APP1. The table in Mutual Backup across a WAN lists the relative weights of all four processes in each of the two fault tolerance groups. Notice that within each group, local members have higher weight than distant members, so a distant member activates only when both local members fail or withdraw from the group.

If both Tokyo processes A and B fail, Seattle process K takes their place. When K receives a prepare-to-activate hint, it begins listening to the subject TOKYO.REQUEST. When the Rendezvous routing daemon detects the new listening interest, it begins forwarding the messages with subject TOKYO.REQUEST from Tokyo to Seattle, where K receives them. When Rendezvous fault tolerance software instructs K to activate, K begins processing those request messages, sending the results back to clients in Tokyo.

To enable this example, the routing daemons on each side of the WAN link must exchange all messages with subjects that match _RVFT.>. For details, see Forward Fault Tolerance Messages across Network Boundaries section in TIBCO Rendezvous Administration.