Passive Monitor
A program can passively track the number of active members of a group using a fault tolerance monitor. That program need not be a member of the fault tolerance group it monitors.
Monitors are passive in that they do not affect the members of the monitored group in any way. Members do not detect that a monitor exists.
Programs that passively monitor a group detect the same number of active members as do members of the group they monitor.
Monitor Callback Function
When a program starts monitoring (by creating a monitor event), it must specify a monitor callback function as a parameter. The callback function must be defined by the program. When Rendezvous fault tolerance software detects any change in the number of active members, it calls the callback function—which receives the number of active members as an argument.
Monitors in Action
Monitors give Rendezvous programs limited capability to determine the health of the fault-tolerant programs upon which they depend. In the most common scenario, a client program monitors a critical service, and adapts its own behavior accordingly. Consider these examples.
Example: Monitor for Data Quality
A data display program receives many items of time-critical information from several groups of fault-tolerant broadcast producers (one group for each kind of information). The display updates every time new information arrives. The end user must be confident that the displayed information is current. The display program monitors the health of each producer group, and displays each information item with a color code indicating the quality of the information (based on the health of the corresponding producer).
For example, if a producer group has an active member (normal operation), then all information from that producer appears on a white background. If the producer has no active member (a catastrophic failure), then the display marks all information from that producer with a yellow background, to signal the end user that the information might be obsolete. When the producer group once again has an active member, the display changes the background of each new item to white, to show that it represents current information.
Example: Monitor for Available Service
A group of query servers responds to requests from numerous client programs. Before submitting a query, a client program checks the number of active servers. If no servers are active, the client program informs the end user that it cannot submit the query. If many servers are active, the client submits the query. If only a few servers are active, the client submits the query, and informs the end user that the response may be delayed.
Example: Monitor to Ascertain Complete Response
Some programs use redundancy to cross-check results. Each member in a fault tolerance group computes the same information—but each uses a different program, coded by a different programming team, running on a different kind of computer hardware platform. A client program receives, collates and compares the results of their computations, and reports to an end user.
The collator must report as soon as it receives a response from all the active members; it must not delay while waiting for a response from a member that has terminated unexpectedly. The collator monitors the group to determine the number of active members. When the number of responses equals the number of active members, the collator reports the combined results.