Persistence Concepts
To avoid possible data loss, you can persist data to physical media. ActiveSpaces allows you to persist data to disk storage and recover data if data loss occurs or there is a problem with cluster startup.
You can persist space data to a storage system such as a database, a key-value store, or even a file system. When you define a space and specify that it is persisted, the space data is maintained in the persistence layer, and can be recovered at startup. In case you need to restart the members of a metaspace and you have a space defined to persist data to a datastore, before using the space, at startup, ensure that you run the recover space command using the as-admin tool. If you do not run the command, spaces with data in the datastore continue to be in the recover state not in the ready state.
In addition, if the space is defined as persistent and you also specify a capacity value and an eviction policy of Least Recently Used (LRU), then you can use ActiveSpaces to cache access to the persistence layer in “cache-through” mode. In this case, applications can transparently access the data stored in the persistent layer through the space. If the data associated with a particular key field value is not in the space at the time of the read request (a “cache miss”), then it is transparently fetched from the persistence layer, and stored in the space such that a subsequent request for a get on the same key value can be serviced directly and much faster by the space (a “cache hit”).
When making a query on a space using a browser or a listener on a transparently cached space, there is a difference in behavior between the shared-nothing and the shared-all persistence modes of operation:
With the built-in shared-nothing persistence, the query can return ALL of the tuples stored in the space regardless of whether they are present in the cached records in RAM or on persistent storage. What is already cached is returned faster than what is evicted, but every matching record is returned. However, to do this, the fields being queried in the space MUST have indexes defined on them.
With external shared-all persistence, listeners and browsers only return the matching records that are present in the RAM-cached subset of the space, and will NOT return records that are only present in the persistence layer at the time the query is issued.
When a space is defined as persisted, it requires at least one persister or at list the minimum allowable number of seeders.
ActiveSpaces provides two types of persistence:
- Shared-All Persistence -The implementation for external “shared-all” persistence is provided in the ActiveSpaces libraries. All nodes share a single persister or a set of persisters. Using the ActiveSpaces API, your application must provide an implementation of the persistence interface and interface to the shared persistence layer of choice.
- Shared-Nothing Persistence - Shared-nothing persistence is built into the ActiveSpaces system, and provides a distributed back-up of space data. Each node that joins a space as a seeder maintains a copy of the space data on disk. Each node that joins as a seeder writes its data to disk and reads the data when needed for recovery and for cache misses. This type of built-in persistence is implemented by the ActiveSpaces libraries
When you implement persistence, you can use RAM to store either all of the data, or the most recently used data. The persistence layer holds all of the data stored in the space but the RAM of the seeder processes is used as a transparent in-line cache of a configurable size.
Shared-All Persistence
If you implement shared-all persistence, your application must provide code to handle reads to and writes from the external persistent storage medium. You can use a traditional RDBMS (or any other centralized disk-based data store) as the persistent storage medium.
With shared-all persistence, certain space members are designated as persisters — to provide the service of interacting with a persistence layer, just as some of the space members — the seeders — provide the basic space service.
With shared-all persistence:
- “Key operations,” for example, Get and Take operations, transparently fetch entries that have been evicted from the space from the persistence layer.
- Queries only return matching records that are cached in RAM at the time the query is issued, but do not return records that have been evicted from the space.
- You can also query the persister for the query results. Rather than returning only the partial matching results from the in-memory data, you can run a query on the back end to retrieve all the matching rows from the data store.
Shared Nothing Persistence
When you use ActiveSpaces’ built-in shared-nothing persistence, your application does not need to implement code to take care of persistence — ActiveSpaces seeders use any file system accessible to them (for example local solid state or disk drives) as the storage (and) medium.
When combined with in-memory indexing, shared-nothing persistence allows you to use ActiveSpaces as a distributed data store using local disks for persistent data storage and RAM as a truly transparent in-line caching layer.
With built-in shared-nothing persistence, if you define indexes on the fields used in a query, ActiveSpaces has a unique ability: because the key fields and indexes for all of the records in the data store are kept in RAM, queries return not just the matching records that are cached in RAM, but also records that have been evicted from the space.