Detecting Duplicate Process Instances

Duplicate messages should be detected and discarded to avoid processing the same event more than once. Duplicate detection is performed when a process instance executes its first Checkpoint activity. You must specify a value for the duplicateKey element in the Checkpoint activity input schema. This value should be some unique key contained in the event data that starts the process. For example, the orderID value is unique for all new orders.

The following describes the procedure for duplicate detection by the process engine:

Procedure 

  1. An incoming message is received and a process instance is created.

  2. Activities in the process instance are executed until the first Checkpoint activity is reached. The Checkpoint activity has a value specified for the duplicateKey input element.

  3. The process engine checks the current list of duplicateKey values for a matching value.

    1. If no process instance has stored the given duplicateKey value, the process engine stores the value and completes the Checkpoint activity.

    2. If another process instance has already stored the given duplicateKey value, the process engine terminates the process and throws a DuplicateException.

  4. Once a process engine stores a duplicateKey value and performs the Checkpoint for a process instance, no other Checkpoint activities in the process instance can be used to store a different duplicateKey value.

Using the algorithm described above, process engines can guarantee that no newly created or recovered process instances will execute if they have the same duplicateKey value. Therefore, you should take care in choosing the value of duplicateKey and ensure that it will be unique across all process instances.

Note: Duplicate detection can only be done across multiple engines on different machines if a database is used to store process engine data. If you are running fault tolerant process engines (that is, only one process engine is running at a particular time), or if all process engines run on the same machine, you can use a file system for process engine storage.

For more information on specifying process engine storage, see TIBCO ActiveMatrix BusinessWorks Administration.