Persistence and Space Life Cycle

When a space needs to be persistent so that the data that is stored in it does not disappear after a disaster (all seeders have crashed) or a maintenance shutdown, you should define it as a persisted space.

Two choices are available for persistence: built-in shared-nothing persistence or external shared-all persistence.

At a high level, persistence is invoked at various steps in the life-cycle of a space:

  • When a space is first reinstantiated after a complete shutdown or crash, the persistence “loading” phase is first invoked to “re-hydrate” the data (or rebuild indexes) from the persistent storage medium:
    • With built-in shared-nothing persistence, loading occurs in parallel on all the space's seeders at the same time.
    • With external shared-all persistence, the onLoad method of one of the registered persistence implementations is invoked.
  • When a change is made to the data in the space (because of a put, a take or any other space action that modifies the data), this change needs to be reflected to the persistent storage medium.

    This is done either synchronously or asynchronously in a distributed manner by each seeder (including those that replicate the data). Data is persisted to it's designated local storage file folder in shared-nothing persistence, or by the persistence implementation's onWrite method in external shared-all persistence mode.

    Because in shared-nothing mode writes are automatically distributed between the seeders (taking into account the degree of the space) and are done to local disk on each seeder, write performance scales along with the number of seeders (just as for a non-persistent space). However, when you use shared-all external persistence is used, because the persistence layer is shared (is a centralized RDBMS, for example) the number of writes per second is ultimately limited by what the external persistence layer can handle and does not scale when more seeders are added to the space.

  • When memory is used as a transparent in-line cache (rather than to store the entire data set), if there is a request to read an entry (as a result of a get, take or lock operation, for example) that is not currently in the part of the data cached in memory, then the entry is automatically loaded from the persistence layer either automatically by the seeders from their local persistent storage in shared-nothing mode or by the persistence's onRead method.
  • When a query (rather than a single entry read) is issued on the space, then when external shared-all persistence is used, the query will only return the matching records that are in the in-memory cache at the time, while with shared-nothing persistence, when indexes are used, the query will return ALL matching records, including those that may have been evicted at the time the query was issued.