Recovering From a BPM Node Crash

Restarting the BPM node after a crash may sometimes fail and/or stop process work because of various reasons. This may be because some work items are missing or the BPM application changes to a ’Partially Running’ runtime state.

These problems may occur because, when a node crashes, the process is killed. However, the XA transactions may remain active and block some engine threads within the database against orphan transactions. When the node is restarted and the orphan transactions are not cleared, some of the process cannot be restarted and result in missing workitems.

Procedure

  • To resolve this issue, ensure that you clear out any transactions that have remained active after the process was killed before restarting the processes. For example, to resolve the issue on a system running against a Microsoft SQL Server database, do the following:
    1. Run the following SQL query to retrieve all the orphaned transactions for a database. In this example, the database ID is 5.
      SELECT a.transaction_id, transaction_uow, transaction_begin_time
      FROM sys.dm_tran_database_transactions a, sys.dm_tran_active_transactions b
      WHERE a.transaction_id=b.transaction_id
      AND transaction_type=4
      AND database_id=5
      AND a.transaction_id NOT IN (select transaction_id from sys.dm_exec_requests WHERE database_id=5)
      ORDER BY transaction_begin_time
    2. Run the following SQL script to generate the KILL commands for the orphaned transactions for a specified database. Run all the SQL KILL commands to rollback all the orphaned transactions.
      DECLARE @database_id AS INTEGER;
      DECLARE @uow AS VARCHAR(128);
      SET @database_id = 5; -- SELECT DB_ID() AS [Database ID];
      DECLARE trans_cursor CURSOR FOR select transaction_uow from sys.dm_tran_database_transactions a, sys.dm_tran_active_transactions b
      	 WHERE a.transaction_id=b.transaction_id
      	 AND transaction_type=4
      	 AND database_id=@database_id
      	 AND a.transaction_id NOT IN (select transaction_id from sys.dm_exec_requests WHERE database_id=5);
      OPEN trans_cursor;
      FETCH NEXT FROM trans_cursor INTO @uow
      WHILE (@@FETCH_STATUS <> -1)
      BEGIN
      	PRINT 'KILL ''' + @uow + ''''; -- WITH STATUSONLY';
      	FETCH NEXT FROM trans_cursor INTO @uow
      END;
      CLOSE trans_cursor;
      DEALLOCATE trans_cursor;
      --select * from sys.dm_tran_locks
       --GO
       --select * from sys.dm_exec_requests
       --WHERE command in ('UPDATE', 'DELETE', 'SELECT', 'INSERT')
       --Go
       --select * from sys.dm_exec_requests
       --WHERE database_id=5
       --GO
       --select * from sys.dm_tran_active_transactions
       --WHERE transaction_type=4
       --ORDER BY transaction_begin_time
       --GO
       --select a.transaction_id, transaction_uow, transaction_begin_time from sys.dm_tran_database_transactions a, sys.dm_tran_active_transactions b
       --WHERE a.transaction_id=b.transaction_id
       --AND transaction_type=4
       --AND database_id=5
       --AND a.transaction_id NOT IN (select transaction_id from sys.dm_exec_requests WHERE database_id=5)
       --ORDER BY transaction_begin_time
       --GO
       --select * from sys.dm_tran_session_transactions