Checkpoint and Restore Directories

To use the Checkpoint and Restore features, or the Durable Data feature, a separate directory must be allocated for each running TIBCO Patterns - Search server. The server must have full permission to add and delete files in this directory. The data on the checkpointed tables is stored in this directory. No two running TIBCO Patterns - Search servers may share the same checkpoint directory. Also, no other process must attempt to modify this directory or its contents in any way. When using the directory for the first time, make sure it is empty. After this, the directory is managed by the TIBCO Patterns - Search server.

In a typical production environment, there would be one TIBCO Patterns - Search server per machine. A checkpoint directory is created on the machine. The TIBCO Patterns - Search server must always run as the same user that owns the checkpoint directory. Typically, a special user ID is created for the TIBCO Patterns - Search server, the user should own the checkpoint directory. The server is then started using the "-R directory'' argument, where "directory" is the full path name to the created directory. The TIBCO Patterns - Search server then takes over the full responsibility for managing the contents of the directory, no user actions are needed. When a server starts, it performs the necessary cleanup of incomplete operations.

Selecting a Checkpoint Directory

Consider the following when allocating a checkpoint or restore directory:

Accessibility. The TIBCO Patterns - Search servers must be able to access the directory from the home directory where the server was started.
File Name Limits. The file system must support long file names.
Space. The file system or partition on which the directory resides must have enough space to store the data. For checkpoint/Restore feature, files may take up to twice the size of the raw data. For the Durable-data feature, an additional 5x space is needed for ongoing record updates. So, if a CSV file dump of the table data takes about 1GB, the checkpoint file may take up to 2GB , and durable-data updates take up to 6GB. Ensure that enough space is available on the device as TIBCO Patterns - Search servers do not check for availability of space before writing out the data.
Size Restrictions Checkpoint. The data for a table is stored in a single file. For large tables this can exceed the maximum file size on some file systems. Ensure that the file system supports sufficiently large files to accommodate your tables.
Network Traffic. If the directory is on a remote mounted file system (for example, NFS), all of the data reads and writes involved in dumping and reading the checkpoint files could generate a very large amount of network traffic which could affect the overall network performance.
Speed. Directories on remote devices or slow devices can significantly impact checkpoint / restore and durable-data performance. A fast local device is recommended.
Reliability. Do not depend on the TIBCO Patterns - Search servers, even when using checkpoints or durable-data, as your primary secure data repository. However, if losing the checkpoint directory significantly impacts the availability of critical services, consider putting the checkpoint directory on a mirrored or RAID device.
Warning: Security. The checkpoint directory must be placed on an encrypted drive, and must not be accessible to unauthorized users.
Each running TIBCO Patterns - Search server must have its own unique checkpoint directory.

Restoring Data at Server Start-up

Specifying the -A (--auto-restore) option in addition to the -R (--restore-dir) option instructs the TIBCO Patterns - Search to restore checkpointed data at start-up. All checkpointed tables, thesauri, models and character-maps are restored. If any checkpointed object fails to restore, the server logs an error and exits.

Warning: Auto-restoring large tables (over 10M records) can take considerable time. Applications may encounter a timeout error if they connect to TIBCO Patterns - Search while it is auto-restoring large tables.

Sometimes, when running multiple TIBCO Patterns - Search servers containing the same tables, it might be convenient to keep a single copy of the tables and have all TIBCO Patterns - Search servers load from this shared copy by using the -a (--restore-from) option. It loads all the checkpointed tables from the named directory. This does not enable checkpointing in the shared directory; it is a read-only operation. The shared directory and all of its files must be readable by the TIBCO Patterns - Search server. The shared directory must not contain incomplete checkpoints. The TIBCO Patterns - Search server validates the contents of the directory, and exits if an incomplete checkpoint is found in the shared directory.

Checkpointing, auto-restore, and shared restore might be combined. Data in the shared directory is loaded first, followed by data from the checkpoint directory. This might override the shared data.

To manage multiple TIBCO Patterns - Search servers with a common data set, designate one TIBCO Patterns - Search server as the master. This server is always started first and has checkpoint and restore enabled on the directory (use the -R and -A options). When started, this master server cleans up any incomplete operations left in the directory.

Warning: If the master TIBCO Patterns - Search server performs checkpoint operations while another server is restoring tables, it can cause the other server to fail and exit. This is especially true if a table is deleted.

Checkpoint Directory Contents

The contents of the directory is managed by the TIBCO Patterns - Search server. The files in the directory are briefly described in the following table:

File name

 

LockFlag

Used to coordinate access to the directory.

object-name-type-flagO000

A checkpoint file for the object named object-name. This is a binary data file. It contains the complete information for the in-memory object needed to restore the object. The type-flag is a one character flag indicating the object type. The possible values for the type-flag are:

D : a Data table

T : a Thesaurus table

C : a Character map

M : a Learn Model

An example for the Learn model is Model1.MO000

object-name-type-flagOhhh

A checkpoint file for a previous version of the object named object-name. The type-flag is as described in the previous entry. This exists only if the system fails during a checkpoint operation. The TIBCO Patterns - Search server cleans this up on start up. This is the original version of the checkpoint file that is being modified by the open transaction with the short ID: hhh, where hhh is three hex digits.

A transaction never has a short ID of '000' and therefore, this is always distinct from the current committed version that has the suffix type-flagO000. This file is deleted when the associated transaction is closed.

object-name-type-flagNhhh

A new, possibly incomplete, checkpoint file. The file was created by the open transaction with the short ID: hhh, where hhh is three hex digits. The file is removed when the transaction is closed.

object-name-type-flagRhhh

This indicates that the object with the name: object-name and the type indicated by the type-flag was renamed by the open transaction with the short ID: hhh, where hhh is three hex digits. This file is deleted when the associated transaction is closed.

hhh.ipt

A log of checkpoint restore actions performed by the transaction with the short ID: hhh, where hhh is three hex digits. This file is deleted when the associated transaction is closed.

OINT

ODEF

OCONT

Sub-directories used by the Durable-Data feature.
OINT/hhhhhhhh-hhhh-hhhh-hhhh-hhhhhhhhhhhh.ointOINT/hhhhhhhh-hhhh-hhhh-hhhh-hhhhhhhhhhhh.tmp Contains naming and object-level transactional information about a data object.
ODEF/hhhhhhhh-hhhh-hhhh-hhhh-hhhhhhhhhhhh.defODEF/hhhhhhhh-hhhh-hhhh-hhhh-hhhhhhhhhhhh.tmp Contains the immutable content of a data object (for example, the field names of a table).
OCONT/hhhhhhhh-hhhh-hhhh-hhhh-hhhhhhhhhhhh.* Contains mutable data of a data object (for example, table records).