Copyright © TIBCO Software Inc. All Rights Reserved
Copyright © TIBCO Software Inc. All Rights Reserved


Chapter 4 MM Metrics and Features Configuration : Understanding Health Metric Rules

Understanding Health Metric Rules
Health metrics are available for cluster members at each level: cluster, machine, process, and agent. The Cluster Overview panel in MM provides a information about the overall health of the cluster, and of each of the cluster members. See Cluster Overview for an example.
Health is defined in terms of the following three health levels:
The colored bulb icons are currently used only in the overall cluster health metric and in alerts. The use of icons is not configurable.
Health metric thresholds are set in the MM.cdd file, which you edit in TIBCO BusinessEvents Studio.
Note the following main points:
Cluster Member Paths
You define the scope of a cluster a metric using a cluster path, for example:
site/cluster/machine/process/inference
A cluster member in this context is a type of cluster node. The path is a hierarchy with specified names for types of cluster node: site, cluster, machine, process. Below the process level, you can specify types of agent:
Path specifying a type of cluster member. Members are specified as follows:
site/cluster
site/cluster/machine
site/cluster/machine/process
site/cluster/machine/process/inference
site/cluster/machine/process/query
site/cluster/machine/process/cache
site/cluster/machine/process/dashboard
Child Cluster Member Paths
In addition to the above values, when you are constructing a Child Cluster Member path in the Health Metric Rule Configuration panel, you can use a wild card character (*). Specific agent instances cannot be specified.
To reference all agents in the system, use the wildcard character after the process level:
site/cluster/machine/process/*
To reference all agents of a particular type, add the type and then specify the wildcard character:
site/cluster/machine/process/inference/*
Two Types of Thresholds
You can compute health for a cluster member using either of two methods.
Health of Child Cluster Members
One method computes the threshold as a percentage of active (or inactive) specified child cluster members. You can optionally specify that only those child cluster members that are themselves at a certain health level are used when computing the threshold. For example, you could set up a threshold such that the overall cluster health level is set to warning when fifty percent or more agents of any type are at health level “Warning.”
Number of Alerts
The other method computes the threshold as a number of alerts of a given severity for the cluster member, during a given time period.
Which Type to Use for Different Cluster Members
In general these are the guidelines for use of these methods:
Using Health of Child Members
Thresholds based on the health of child members can use child member health levels or child member activity status (active or inactive), or both. You can also set a threshold value such that the health level of the parent is set only if a minimum percentage of child members satisfies the specified condition.
For example, if you are setting up thresholds for site/cluster/machine, you might select site/cluster/machine/process as the child member type. You might specify that the health level should be set to warning on the machine level if any process unit on that machine has a health level of warning. Or you might set the health level of a machine to critical if any of its process units is inactive.
You can also use different child members when configuring each health level for a parent member, depending on your need.
Using Number and Frequency of Alerts
To define the threshold for a cluster member’s health level using alerts, you define which alert severity level to use, and the frequency of alerts received during a specified time period.
All alerts of a specified severity defined for the cluster member are counted.
MM begins a count after it receives the first alert for the specified cluster member. After the time specified in Range has elapsed, the application counts the number of alerts of the specified severity were received during this period. If the count meets or exceeds the threshold, the health indicator is changed to the specified health level for this rule.
Health Metric Rule Examples
Below are some examples to help you think about the way you want to configure your health metric rules.
Examples Using Child Cluster Member Health Metrics
These examples show how rules can be configured to display a health level indicator on a cluster member based on the health levels its child members.
Note  These rules could be set on any parent cluster member of the specified child members. The parent member is not shown in the examples. The scope of the rule is wider for parent members higher in the cluster member hierarchy.
To set the health level to critical if a single inference agent are inactive
Set Path to site/cluster/machine/process/inference
Add a property called active whose value is false
To set the health level to critical if all agents are inactive
Set Path to site/cluster/machine/process/*
Add a property called active whose value is false
To set the health level to warning if fifty percent of agents are inactive
Set Path to site/cluster/machine/process/*
Add a property called active whose value is false.
To set the health level to critical if all agents are inactive
Set Path to site/cluster/machine/process/*
Add a property called active whose value is true
To set the health level to warning if thirty percent of inference agents have a health level of warning
Set Path to site/cluster/machine/process/inference
Add a property called healthLevel whose value is warning
Examples Using Alerts
These examples show how rules can be configured to display a health level indicator for a cluster member based on the number of alerts received in a time window. In these examples (unlike the child cluster member examples) the cluster member path is shown. The cluster member path is used in both types of rules but is more relevant to display here.
To set the health level to warning if one critical alert is received for a cluster
Add a property called severity whose value is critical
To set the health level to warning if 5 or more critical alerts are received within a window of 5 minutes, for a query agent
Set Cluster Member Path to site/cluster/process/query
Add a property called severity whose value is critical

Copyright © TIBCO Software Inc. All Rights Reserved
Copyright © TIBCO Software Inc. All Rights Reserved