Copyright © TIBCO Software Inc. All Rights Reserved
Copyright © TIBCO Software Inc. All Rights Reserved


Chapter 2 TIBCO ActiveSpaces Fundamentals : How is a Space Implemented?

How is a Space Implemented?
ActiveSpaces stores data in a space and handles requests to read and write data in a distributed peer-to-peer architecture. Entities that need access to a space join the space as members.
Member Roles
Members can join a space as a seeder or as a leech:
Seeders play an active role in maintaining the space by providing CPU and RAM.
Leeches play a passive role. They have access to space data but provide no resources.
Seeders
Seeders store the data contained in a space and handle requests to read and write the data. One or more seeders read and write the data in a distributed peer-to-peer manner. Seeder applications join the space and indicate their willingness to lend some of the resources of the host where they are deployed to scale the service provided by ActiveSpaces. In effect, the seeding applications have a portion of the space embedded in their process.
ActiveSpaces distributes the data stored in the space evenly between all of the processes that have joined the space as seeders. ActiveSpaces is an elastic distributed system. Seeders can join and leave the space (effectively scaling it up or down) at any time without the need to restart or reconfigure any other participant in the space. When this happens. the distribution of entries is automatically rebalanced if necessary to maintain even distribution between the seeders.
Leeches
An application can also join a space as a leech. A leech does not contribute any of its host's resources to the scalability of the space, but has no limitation in functionality or ability to use the space. As opposed to a seeder, a leech is not considered by the distribution algorithm and therefore can join or leave the space without causing redistribution of the data.
You can consider seeders to be “servers” for the space, and leeches to be “clients.” However, because applications can join a space as seeders, effectively embedding ActiveSpaces inside the application process, an application joining a space as a seeder is both a server and a client. Note also that the role played by an application is on a per space basis: a single application might be a seeder on one space and a leech on another space.
The as-agent Process
ActiveSpaces also includes a process called as-agent. The as-agent process provides:
You can set up an as-agent to provide proxy access to the metaspace for remote clients by specifying a “remote listen” URL in the as-agent command line arguments.
As-agents can also implement shared-nothing persistence.
When to Join the Space as a Seeder or a Leech
To understand when your application should join a space as seeder or as a leech, consider the following:
You can also use the as-agent process to “keep the data alive” when all of the instances of an application have disconnected from the metaspace.
Space Attributes
The attributes of spaces include distribution, , expiration, times to live (TTLs), and persistence.
Distribution
A space may be either distributed or non-distributed. For distributed spaces, management of the space data is shared among the seeders that have joined the space. If the space is non-distributed, a single seeder is responsible for all tuples in the space. (For non-distributed spaces, other seeders may still store the tuples in the space, according to the degree).
When a space is distributed, responsibility for storing the tuples is distributed evenly among all the seeders joined to the space. When a space is non-distributed, the responsibility for storing tuples in the space is assigned to one of the seeders joined to the space (other seeders joined to the space may also replicate these tuples if a degree of is specified).
Distributed Space
By default, spaces are distributed. In a distributed space, management of the space’s entries is distributed among the seeders that are members of the space, and the ActiveSpaces distribution algorithm ensures that entries are distributed evenly in the space.
Figure 1, Distribution of Entries in a Space shows how the entries for a space are distributed between seeders in the space. Each seeders has approximately the same number of entries.
Figure 1 Distribution of Entries in a Space
To ensure the best possible (most even) distribution of entries in a space regardless of the number of entries, the granularity of the ActiveSpaces distribution algorithm is a single key field’s value. This means that an individual distribution decision is made for every entry stored in the space. There is no need to define a number of “partitions” for the space that would only provide optimal distribution when the number of entries stored in the space (or the number of seeders for the space) is within a certain range: in essence, every entry in the space is a partition.
In a distributed space, management of the space’s entries is distributed among the seeders that are members of the space:
Non-Distributed Space
A non-distributed space is entirely managed by a single member. The primary reason for using non-distributed spaces is to get absolute view synchrony, so that changes are seen in the same order (as opposed to seeing changes in the same key in the same order).
At any time, one member of the space—the seeder, is in charge of managing the entries for the space. The scalability of the space is limited to the number of entries that the single seeder can manage.
Minimum Number of Seeders
It is possible to define a minimum number of seeders for a space. If this attribute is defined, the space is not usable until the required number of seeders have joined it. Since it is not possible to service any operation on a space until there is at least one seeder for it, there is always an implied default value of 1 for this setting.
Persistence
ActiveSpaces allows you to persist data to disk storage and recover data if data loss occurs or there is a problem with cluster startup.
ActiveSpaces provides two types of persistence:
Shared-Nothing Persistence Each node that joins a space as a seeder maintains a copy of the space data on disk. Each node that joins as a seeder writes its data to disk and reads the data when needed for recovery and for cache misses.
Shared-All Persistence The implementation of this mode of persistence is built into the ActiveSpaces libraries. All nodes share a single persister or a set of persisters. Your application must provide an implementation of the persistence interface and interface to the shared persistence layer of choice.
Persistence Policy
For both types of persistence, you can specify that the persistence is maintained synchronously or asynchronously.
Shared-Nothing Persistence
With shared-nothing persistence, each node that joins a space as a seeder maintains a copy of the space data on disk.
Where Is Persisted Data Stored?
When you configure shared-nothing persistence, you must use a unique name for each member joining the spaces, and you must specify an existing directory path for which ActiveSpaces has read and write access.
You can specify the directory path for data storage as follows:
The directory you specify is used as the root path under which ActiveSpaces creates its own subdirectory structure, using the format metaspace/space/member.
ActiveSpaces creates and manages persistence files automatically. You do not have to provide a filename for the stored data—the data store directory is used as the location to create and use the file.
For detailed information on implementing shared-nothing persistence, see Setting up Persistence.
Shared-All Persistence
With shared-all persistence, certain space members are designated as persisters — to provide the service of interacting with a persistence layer, just as some of the space members — the seeders — provide the basic space service.
For detailed information on implementing shared-all persistence, see Setting up Persistence.
Terms and Concepts for Persistence
The following terms and concepts are useful for understanding persistence:
Space State Indicates whether the space can accept regular space operations or not. This happens only when the space is in READY state.
Persistence Type Defines what type of persistence ActiveSpaces uses. Shared-all and shared-nothing are the supported types. Only one type of persistence can be configured on the same space at the same time.
Persistence Policy Defines how the changes to space will be persisted—synchronously or asynchronously.
Member Name A unique name to identify each node/seeder/member. Recommended if using shared-nothing persistence.
Data Store The file system/directory location where ActiveSpaces stores the persistence files.
Data Loss Data loss is detected by ActiveSpaces when the number of nodes (seeders) that either leave or fail (due to a crash) the space exceeds the count set for that space. In such a scenario, space will be marked as FAILED.
Space Recovery ActiveSpaces recovers a space (based on user intervention) when the space state is FAILED either due to data loss or cluster startup.
Space Resume ActiveSpaces resumes a space (based on user intervention) when the space goes into a SUSPENDED mode due to loss of a persister.
TIBCO ActiveSpaces Cluster Startup with Persistence
With shared-nothing persistence, when ActiveSpaces nodes are started for the first time and join the metaspace (and subsequently the defined space), ActiveSpaces creates new data store files, and since there are no old files to recover from, the space automatically enters the READY state and is available for space operations.
If any ActiveSpaces node is restarted either after a failure or as a new member, the space is available for space operations if none of the nodes in the space find files to load data from. If any node has an old file, the space state is set to WAITING (or INITIAL if starting nodes after failure), and your application must initiate a load action.
Space Recovery with Persistence
When you configure persistence, you have the option of configuring space recovery. Space recovery has two options that you can specify through the API functions or the recover command:
Recovery with Data Use this option if data loss is not acceptable and you want to reload the data from persistence files into the space.
Recovery Without Data If data loss is acceptable, then use recovery without data. This specifies that ActiveSpaces does not load data back into the space from persistence files.
You can perform recovery by using:
For detailed information on setting up recovery, see Setting up Recovery with Persistence.
Space Resume with Shared-All Persistence
When a space loses one of its persisters, the space is set to a SUSPENDED state, which means that no writes to persistence files can happen. In this case, you can resume the space.
Space Life Cycle
The space life cycle starts when the space is first defined in the metaspace and ends when the space definition is dropped from the metaspace.
A space can be in one of the following states:
INITIAL The space has been defined and is waiting for the minimum number of seeders required by the space's definition to be reached, and for at least one persister to be registered if it is a persisted space.
LOADING The space is a persisted space that has reached the required minimum number of seeders and has at least one registered persister. One of the persister's onLoad methods is being invoked and the space data is being loaded from the persistence layer.
READY The space has the required minimum number of seeders and if persisted, has been loaded and has at least one registered persister.
Space operations that read or write data in the space are only allowed when the space is in the READY state. The only exception to this rule is that the space's load method can be invoked when the space is in the LOADING state (typically by the registered persister onLoad method).
Your application can check that the space is in the READY state before attempting to use it by using the space's isReady() method.
Your application can also synchronize itself with the space's state by using the space's waitForReady method. This method takes a timeout that is the number of milliseconds for which it will block while waiting for the space to reach the READY state and returns a boolean value indicating whether the timeout was reached or not (Java also has a convenience version of the method that does not take a timeout and just blocks until the space is ready).
Another way to synchronize an application with the space's state is to rely on the space definition's SpaceWait attribute: a configurable timeout that is used to block space operations when the space is not in the READY state until either the space becomes ready (at which point the operation is executed) or the SpaceWait timeout expires (at which point the operation will fail).
Persistence and Space Life Cycle
When a space needs to be persistent so that the data that is stored in it does not disappear after a disaster (all seeders have crashed) or a maintenance shutdown, you should define it as a persisted space.
Two choices are available for persistence: built-in shared-nothing persistence or external shared-all persistence.
At a high level, persistence is invoked at various steps in the life-cycle of a space:
With external shared-all persistence, the onLoad method of one of the registered persistence implementations is invoked.
This is done either synchronously or asynchronously in a distributed manner by each seeder (including those that replicate the data). Data is persisted to it's designated local storage file folder in shared-nothing persistence, or by the persistence implementation's onWrite method in external shared-all persistence mode.
Because in shared-nothing mode writes are automatically distributed between the seeders (taking into account the degree of the space) and are done to local disk on each seeder, write performance scales along with the number of seeders (just as for a non-persistent space). However, when you use shared-all external persistence is used, because the persistence layer is shared (is a centralized RDBMS, for example) the number of writes per second is ultimately limited by what the external persistence layer can handle and does not scale when more seeders are added to the space.

Copyright © TIBCO Software Inc. All Rights Reserved
Copyright © TIBCO Software Inc. All Rights Reserved