Recovering After a Crash

If a process engine crashes, all process instances can be recovered up to the point of their last checkpoint. You must be careful with certain types of process starters or incoming events when placing your checkpoint in a process definition.

For example, if the process starter is waiting for an incoming HTTP request, and a checkpoint is taken after the process starts but before the response to the request is sent, the process cannot respond to the request when the process instance is restarted. The socket for the HTTP request is closed when the process engine crashes, therefore the Send HTTP Response activity in the restarted process returns an error. In this case, place the response activity before the checkpoint so that any response is sent before a checkpoint is taken.

There are other examples of situations where an incoming event must be handled before the checkpoint is taken. The following lists some of these circumstances:

  • An email message is received, then deleted from the email server.

  • An HTTP request is received.

You should exercise care in placing checkpoints in your process definitions. Make certain that the process has all of the data required to continue at the time of the checkpoint so that in the event of a failure, a restarted process does not attempt to access resources that no longer exist.

Note: By default, checkpointed process instances are restarted when the engine restarts. If the engine encounters errors during startup, the restarted process instances continue to be processed and may eventually be lost depending upon the type of error at startup. You can specify to shut down the process engine if any errors are encountered during startup so that checkpointed jobs are not lost in the event of an error. The custom engine property named Engine.ShutdownOnStartupError controls this behavior. By default, the value of the property is false. Setting the property to true shuts the engine down if errors are encountered when the engine starts. For more information about setting custom engine properties, see TIBCO ActiveMatrix BusinessWorks™ Administration.

Checkpoints and the Confirm Activity

In the case of confirmable messages (for example, a confirmable TIBCO Rendezvous or Adapter message is received), you must consider the consequences of performing a checkpoint before or after a Confirm activity.

If the checkpoint is taken before the Confirm activity, then a crash occurs after a checkpoint but before a confirm, the original message is resent. In this case, the restarted process can no longer send the confirmation. However, a new process is started to handle the resent message, and you can implement your process to handle the restarted and new processes appropriately.

If the checkpoint is taken after a Confirm activity, there is potential for a crash to occur after the Confirm but before the checkpoint. In this case, the message is confirmed and therefore not redelivered. The process instance is not restarted, because the crash occurred before the checkpoint.

You must consider the type of processing your process definition performs to determine when a checkpoint is appropriate if your process definition receives confirmable messages.