Simple Fault Tolerance with Groups
Application programs can use the group facility to coordinate fault-tolerant operation.
In the following diagram, consider an application program that can operate in either of two roles, according to its application logic:
- A1 is the active role: the program subscribes to messages, processes each message, and sends another message in response.
- A2 is the standby role: the program subscribes to messages, but neither processes them nor sends responses.
Simple Fault Tolerance, Role Behavior Ordinal Role Description 1 A1 Actively process messages 2 or greater A2 Standby
In the following diagram, when a process instance of the program starts, it joins
Group_A, and receives its ordinal. The first process to start (Process P1) receives ordinal 1, so according to its application logic, it enters role A1 to subscribe, receive, process, and send messages. The second process to join the group (Process P2) receives ordinal 2, so it enters the A2 standby role. As other processes join the group, they receive ordinals 3, 4, and 5 in sequence and enter the A2 standby role. These are Process P3 through Process P5 in the diagram. In the following Timeline table, time t1 describes this state.
| Timeline: Processes Join Group_A | |||||
|---|---|---|---|---|---|
| Time | Process P1 | Process P2 | Process P3 | Process P4 | Process P5 |
| t1 | Ord=1
Role=A1 |
Ord=2
Role=A2 |
Ord=3
Role=A2 |
Ord=4
Role=A2 |
Ord=5
Role=A2 |
| t2 | Ord=-1 Role=A2 (or exit) |
Ord=1
Role=A1 |
Ord=2
Role=A2 |
Ord=3
Role=A2 |
Ord=4
Role=A2 |
| t3 | Ord=5 Role=A2 |
Ord=1
Role=A1 |
Ord=2
Role=A2 |
Ord=3
Role=A2 |
Ord=4
Role=A2 |
Time t2 in the previous diagram illustrates the state when Process P1 exits, or becomes disconnected from the group service. All group members receive new ordinals within the group, usually decrementing their existing ordinal. In particular, Process P2 receives ordinal 1, enters role A1, and begins processing messages and sending responses. If Process P1 is still running while disconnected from the group service, then the group facility assigns it ordinal -1 and attempts to reconnect to the group service. The program can either exit, or enter the standby role A2, according to its program logic.
Time T3 illustrates the state when Process P1 restarts, or reconnects to the group service. Process P1 receives the lowest unassigned ordinal, and operates in the corresponding role, A2. Notice that Process P1 does not resume with ordinal 1. Instead, P2 retains ordinal 1.