Overview of Error Handling

Active Cluster Guide > Handling Active Cluster Errors > Overview of Error Handling

Active Cluster is resilient after connection and node failures; most of the time such failures are resolved automatically:

• If a node goes down, the rest of the group continues to operate as a single cluster. After the node is restored, it automatically reconnects with the rest of the group.

• If a connection failure occurs and the cluster becomes partitioned into subgroups, each subgroup operates as a separate cluster with its own timekeeper. After the connection is restored, the subgroups automatically merge to form a single group with a single timekeeper.

If you do have cluster problems, such as nodes with frequent DISCONNECTED, BLOCKED, or BLOCKING status, refer to Troubleshooting.

Get Cluster Status

You can get an overall picture of the activity of a cluster on the CLUSTER MANAGEMENT page in the Manager. Nodes that are off-line or down for any reason have the status of DISCONNECTED. See Viewing the Status of an Active Cluster, for more information.

The cluster status view depends on which node you are viewing in the browser (the “local node”). The local node is the one with a number in brackets in the Status column (second from the left); for example, “[36]”.

Only a local node can have the status of OPERATIONAL; all remote nodes that are operational have the status of CONNECTED_READY.

When troubleshooting, it is a good idea to log in to each node. For example, a node can be blocking other nodes, but its status is OPERATIONAL. In this case, other nodes would have status of BLOCKED, meaning the OPERATIONAL node is actually blocking the other nodes and needs repair.