Two Types of Thresholds
You can compute health for a cluster member using either of two methods: Helath of Child Cluster Members or Number of Alerts.
- Health of Child Cluster Members
- One method computes the threshold as a percentage of active (or inactive) specified child cluster members. You can optionally specify that only those child cluster members that are themselves at a certain health level are used when computing the threshold. For example, you could set up a threshold such that the overall cluster health level is set to warning when fifty percent or more agents of any type are at health level “Warning.”
- Number of Alerts
- The other method computes the threshold as a number of alerts of a given severity for the cluster member, during a given time period.
The following guidelines are used to decide which method to implement for different cluster members:
- Use Health of Child Members to compute overall cluster health and machine level health.
- Use Number and Frequency of Alerts to compute the health of processes and agents.
Health of Child Members
Thresholds based on the health of child members can use child member health levels or child member activity status (active or inactive), or both. You can also set a threshold value such that the health level of the parent is set only if a minimum percentage of child members satisfies the specified condition.
For example, if you are setting up thresholds for site/cluster/machine, you might select site/cluster/machine/process as the child member type. You might specify that the health level should be set to warning on the machine level if any process unit on that machine has a health level of warning. Or you might set the health level of a machine to critical if any of its process units is inactive.
You can also use different child members when configuring each health level for a parent member, depending on your need.
Number and Frequency of Alerts
To define the threshold for a cluster member’s health level using alerts, you define which alert severity level to use, and the frequency of alerts received during a specified time period.
All alerts of a specified severity defined for the cluster member are counted.
MM begins a count after it receives the first alert for the specified cluster member. After the time specified in Range has elapsed, the application counts the number of alerts of the specified severity were received during this period. If the count meets or exceeds the threshold, the health indicator is changed to the specified health level for this rule.