Concurrent File Loading

In a clustered environment, if more than one FileWatcher attempts to process an incoming file, it would lead to concurrency issues and possible data corruption. This is addressed by the lock mechanism which establishes a semaphore by creating a lock file for the DataSet and releasing the lock once the processing is done.

Before processing an incoming file, FileWatcher attempts to acquire a lock on the lock file specified as a part of DataSet parameters. To acquire a lock, it attempts to create a file with the DataSet name suffixed with name specified in the LockFile parameter (<DataSet name>_<LockFile>). If the lock is acquired for the specified DataSet, FileWatcher processes the incoming files for that DataSet and deletes the lock file after the processing is done.

If the LockFile for a DataSet already exists and lock cannot be established, the FileWatcher skips the DataSet processing and works on any other DataSets that have been configured.

The lock file is created in the <Relative>/<URI> path provided in the URIInfo section of the DataSet. It contains information about the instance which acquired the lock so that the abandoned lock can be cleaned up by the correct instance. The instance ID used in the lock file is given by the JVM argument NODE_ID.

A LockFile can be specified for each DataSet. If no LockFile parameter is specified in the DataSet information, the global LockFile parameter is used to determine lock file extension.