Timeouts During Maintenance

During grid maintenance, especially when stopping tibdgproxy or tibdgnode processes, an ActiveSpaces client application can experience timeouts for requests it has made to the grid. The client application should be prepared to handle these timeout errors being generated in the application such as by logging or retrying the request.

For example, stopping a primary node causes requests to time out until the secondary node detects that the primary node is gone and takes over as the new primary node for that copyset. In addition, stopping a synchronized secondary node can cause timeouts until the primary node can successfully update the state keeper to indicate that the secondary node is out of sync.

When you restart the secondary node that is out of sync, a background synchronization process takes place. During the synchronization process, the ongoing live operations that are coming to the grid do not time out. Once the background synchronization of the secondary node is complete, the secondary node performs a small internal final step with a primary node.

If at all there are any operations that are timed out during this final step, the client application must handle such operations.

The -promote argument to the tibdg node stop command minimizes the amount of time that it takes to stop a primary node and promote a synchronized secondary node in its place.

For more information about the -promote option, see Selecting a Secondary Node to be Promoted as the Primary Node.