Fault Tolerance in Action
Rendezvous fault tolerance software uses Rendezvous software to communicate between processes.
Components and Operating Environment
Fault Tolerance Operating Environment illustrates the components of a fault-tolerant distributed program in a typical network operating environment. Four identical program processes (A1, A2, A3, A4) run on four separate computers, connected by a network. The program incorporates the Rendezvous API library (including fault tolerance and communications software); a Rendezvous daemon process mediates between each program process and the network.
Figure 30: Fault Tolerance Operating Environment
Example Fault-Tolerant Multicast Producer
In one scenario, a program produces a stream of multicast messages to the network (for example, stock market prices, news stories). Other programs on the network listen to those messages and consume the information.
Only one of the four producer processes (A1) actively sends information. The other three are backup processes; they each compute an information stream that is parallel to that of the active process, but they do not send the information. If the active process stops functioning, Rendezvous fault tolerance software directs one of the three backup processes to begin active multicasting in its place. In this way the redundant processes cooperate to provide a fault-tolerant service without flooding the network with redundant information.