Live Backup and Restore

TIBCO ActiveSpaces live backup and restore is a feature that uses the concept of checkpoints to provide the ability to create a grid-wide consistent backup of a running data grid. A checkpoint is a set of persistent files containing the state and data from a single data grid at a specific point in time. A checkpoint can then be used to restore a complete data grid on the same machines, or to move the entire data grid to different machines.

The backup and restore procedure described in this document can only be used when a single data grid is running in a realm. No other processes can be configured in the realm or the restore can be corrupted. The backup of the data grid is taken while the processes of the data grid are running. To restore the data grid processes from a backup, all processes of the data grid are first stopped and a full restore of the entire data grid is done from the backup.

To take a backup of an ActiveSpaces data grid, you must take a backup of the following processes:
  • Realm Service
  • ActiveSpaces State Keepers
  • ActiveSpaces Nodes of each copyset
When a data grid has to be restored, you must ensure that the following components are restored:
  • The realm service database
  • The data grid configuration in the realm service
  • The state keepers
  • The nodes of each copyset

The data used to restore the grid configuration, state keepers and nodes must be from the same backup. Before creating a backup of the data grid, you can optionally switch the data grid to maintenance mode to prevent writes from occurring when backing up the data grid. For details, see Preventing Data Loss by Using the Maintenance Mode.

Backup Data Locations

Realm service
An ActiveSpaces data grid is run inside of a TIBCO FTL realm. A realm embraces all the administrative definitions and configurations that enable communication among the processes of the data grid and its clients. A realm service contains the complete realm definition. For more details, see "Processes in ActiveSpaces" section in the TIBCO ActiveSpaces® Concepts guide.

Each realm service has a set of working data files which contain the configuration information about the FTL realm and ActiveSpaces data grid. These data files are stored in separate locations for each realm service. By default, when a realm service is started, it uses the current directory to store the data files. You can also specify the directory a realm service must use for its data files by passing the --data command-line option when starting the tibrealmserver executable. If you stop a realm service and then restart it, the realm service reads its configuration from previously existing data files.

If you are using TIBCO FTL 5.4.1, remember that in a realm only the primary realm service can accept realm configuration updates. The primary realm service deploys its current realm definition to its backup realm service and satellite realm services. Backup and satellite realm services cannot directly accept realm configuration updates from administrators or ActiveSpaces. For more information about the types of realm services, see the "Server Roles and Relationships" section in the TIBCO FTL® Administration guide.
Note: if you are using TIBCO FTL 6.0 or later, any primary server in the cluster can accept realm configuration updates.

Checkpoints and Realm Services

When an ActiveSpaces checkpoint is created, the primary realm service's database is backed up and the configuration of the realm service is also saved as part of the checkpoint. The copy is named with a timestamp reflecting the time at which the backup was created. The backup file is created in the following directory of the primary realm service:
<realm_service_data_dir>/backups
The backup of the database and realm configuration can then be used to restore the primary realm service. Creating a checkpoint fails if a realm service is not reachable. For example, a checkpoint is not created if primary and backup realm services are down.
State Keepers
ActiveSpaces state keepers store internal governing state information about your data grid. Each state keeper maintains a copy of this internal state information in a file on disk. While defining the grid configuration, you specify the location of the state keeper files by using the --dir configuration option. By default your current directory is used to store the state keeper disk files. For example,
keeper create --dir ./k_0_data k_0
When a state keeper is first started, it receives the initial grid configuration from the realm service. While the data grid is running, the state keepers record the current running state of the data grid. If you stop and restart a state keeper, the state keeper process uses the data files from its data directory to recover the data grid's running state.

In a fault tolerant set of state keepers, one of the state keepers are designated the lead state keeper. If the lead state keeper goes down, one of the remaining state keepers takes over as the lead. A quorum of 2 state keepers, in a fault tolerant set of state keepers, must be running to ensure data consistency in split brain scenarios. If a state keeper is restarted while a quorum is running, one of the running state keepers updates the restarted state keeper's state.

Checkpoints and State Keepers

When an ActiveSpaces checkpoint is created, the state keeper's internal governing state information is also saved as part of the checkpoint. This checkpoint data file can then be used to restore a state keeper's state when the state keeper is restarted. Creating a checkpoint fails if a quorum of state keepers is not running.
Nodes
Each ActiveSpaces node stores rows of data for the tables that are defined for the data grid. The rows of data are stored in memory and on disk. While defining the grid configuration, you specify the location of the node files by using the --dir configuration option. By default your current directory is used to store the node's disk files. For example, the following statement indicates that the node stores its disk files by using a top-level directory of ./cs1_n1_data.
node create --copyset cs1 --dir ./cs1_n1_data cs1_n1
The location where the node's disk files are stored is referred to as the node's data directory. Under the node's data directory there are the following subdirectories:
  • live - holds the disk files that contain the data stored on the node
  • checkpoints - holds the checkpoint related subdirectories and files

Checkpoints and Nodes

When an ActiveSpaces checkpoint is created, the relevant files needed to restore each node of a data grid are created and stored in the checkpoints subdirectory of each node's data directory. When a checkpoint is created, each running node saves its current state to the following directory:
<node_data_dir>/checkpoints/<timestamp>_<epoch>_<counter>_<checkpoint_name>/d
ata
Additionally, the data grid's configuration from the primary realm service and data grid's internal state from the state keepers are saved by each node of the first copyset defined in your data grid's configuration to the following directory:
<node_data_dir>/checkpoints/<timestamp>_<epoch>_<counter>_<checkpoint_name>/
metadata
The checkpoint epoch is always zero unless there has been a disaster recovery failover to another data grid. The checkpoint counter is incremented with each checkpoint that is created. If your data grid is configured with a copyset_size greater than 1, the nodes of the first copyset defined for your data grid and identical copies of the metadata files that include statekeeper-recovery files.
Copysets
A copyset defines a relationship between multiple nodes for the purposes of data replication. If more than one node is defined for a copyset, one node acts as the primary node and data updates from client applications first occur on that node. The primary node then ensures the data update is replicated on the other nodes in the copyset. If the primary node goes down for some reason, one of the other nodes in the copyset takes over as the primary node. Updates from client applications continue as usual without any loss of data because all of the data has been replicated from the original primary node to all of the other nodes in the copyset. The nodes of a copyset must reside on different machines to ensure that one machine failure does not cause data loss.

Checkpoints and Copysets

When an ActiveSpaces checkpoint is created, restoring a copyset is done by restoring the realm service configuration, state keeper configuration, and the data for each node of the copyset. There is nothing specific to restore for a copyset itself.