Copyright © TIBCO Software Inc. All Rights Reserved
Copyright © TIBCO Software Inc. All Rights Reserved


Chapter 2 TIBCO ActiveSpaces Fundamentals : How to Use a Space

How to Use a Space
A space provides your application with the ability to store data, retrieve data, and be notified when the data contained in a space is modified as soon as the modification happens.
Besides reading and writing entries by key field values, you can use a space browser to query and iterate, consume, or apply locks through multiple entries stored (or updated) in a space. Your application can also enable a listener, to subscribe to a space and have a callback method invoked when any change happens on the space.
Batch Versus Blocking Operations
By default, spaces are distributed, which means that the servicing of requests and storage of entries for the space is implemented in a distributed manner by all of the space's seeders.
If seeders are distributed over a network, then some operations require at least one network round-trip to complete. Therefore, using the parallelized batch versions of the operations (or distributing space operations over multiple threads) rather than invoking the same blocking operation in a loop is the best way to achieve a high throughput of operations.
Storing Data into a Space
Your application can store data into a space by using the space’s put method and passing it a tuple as its argument. Once the tuple is in the space, it can be accessed by any other application using that space. Existing entries are replaced with new ones, which means that if there was already a tuple with the same key field values stored in the space, it is overwritten by the new tuple.
When a tuple is stored into a space, it is validated against the space definition as follows:
If a tuple's field does not match the space's definition, ActiveSpaces attempts to automatically convert the field’s value to the desired type as long as the field type is numerical (no lexical casting).
Fields marked as nullable need not be present in the tuple, but if they are present, their type must match or be able to be upcasted.
Retrieving Data from a Space
There are three ways to retrieve (or consume) data from a space:
Get Method A tuple space implements the associative memory paradigm and allows the application to get a complete copy of the tuple associated with specific values of it's key fields.
This is done by using the spaces’s get method and passing a tuple containing appropriate key field values for that space. If a tuple with matching values for its key fields is currently stored in the space, the value of the status in the result object returned by the get method is equal to OK. If no tuple in the space has matching values for the key fields, the value of the status in the result object is NULL.
Callback Query Method You can create listeners on a space that invoke a user query callback function as filtered initial data and new data are pushed from the space to the listeners. For more information on listeners, see Listening to Changes.
Space Browser Query Method You can also create space browsers on the space that let users retrieve filtered data initially stored in the space and retrieve new data tuple by tuple and on demand. For more information on space browsers, see Using Space Browsers.
Which method you use to retrieve data from a space depends on the application logic of your code:
Consuming or Removing Data from a Space
You can remove tuples from a space by using the space’s take method and passing a tuple containing the appropriate key fields for that space. The take method behaves exactly like an atomic get-and-remove: If a tuple with matching values for its key fields is currently stored in the space:
The status value of the result passed to the take operation is be equal to OK.
Otherwise (if there is no tuple with matching values for its key fields currently stored in the space), there is nothing to take, and the result's status is equal to NULL. Since ActiveSpaces provides immediate consistency, you have a guarantee that if two separate applications issue a take for the same entry at the same time, only one of them will see its take operation succeed; the other one will see its result's status be equal to NULL.
Unlike a simple delete operation that succeeds even if there is nothing to delete, you can use the take operation to effectively “consume” data from a space (for example, using a space browser), and your application can easily distribute workload using ActiveSpaces.
You can also perform a take operation on all or a filtered subset of the tuples contained in a space by using a space browser. For more information on space browsers, see Using Space Browsers.
Indexing
Indexes are data structures that are used internally by ActiveSpaces seeders to speed up the filtering of data when processing queries (the filtering of tuples contained in the space when a browser or a listener is created with a filter string and a time scope of either ALL or SNAPSHOT). With larger data sets, indexes can dramatically speed up filtering of the tuples; however, using indexes requires additional memory for the processes seeding on the space, because each seeder maintains an index for the tuples that is seeds and needs memory to hold the indexing data structure. In addition, “write” requests (puts and takes) also require additional CPU time on the seeders to update the index.
ActiveSpaces lets users and administrators define as many indexes as they want on a space, as required, depending on the types of queries that will be run over the space. Indexes are part of the space's definition and are built on one or more of the fields that are defined for the space. You can build indexes on any of the fields defined for the space. Indexes have a type, which can be either “HASH” or “TREE.” Hash indexes speed up queries where the filter is an exact match ('=' operator) of a value to the field, e.g.: “field = value”. Tree indexes speed up queries where the filter is a range match ('>', '<', '>=', '<=' operators) of a value to the field, e.g. “field > value.”
If your query filter uses only one field, then you can speed it up by defining an index just on the field that it uses. If your query filter uses more than one field, then you can speed it up by creating a 'composite index' on the fields used in the filter. In this case the order of the fields when the index is defined matters when the TREE index type is used and the query filter contains both equality and range operators separated by 'AND': for example if the query is “field1 = value1 and field2 = value2 and field3 > value3” then in order to benefit from the index, it should be defined on fields “field1”,”field2,”field3” in that order (and only in that order).
A particular field can be used in more than one index, for example if two query filters such as “field = value” and “field > value” are to be used, then you could define two indexes on the field in question: one of type 'HASH' and the other one of type 'INDEX,' and the ActiveSpaces query optimizer will automatically use the appropriate index depending on the query being issued.
There is always an index automatically created on the key fields of the space, this index is of type HASH by default (but can be changed to a TREE type if needed).
Concurrently Updating Data in a Space
When multiple processes concurrently get and update tuples in a space, two processes might try to update the same tuple at the same time. In that case, it is often necessary to serialize updates. The classic example of this scenario is that of a bank account balance: if a deposit and a debit to the same bank account are being processed at the same time, and if each of these operations follows the pattern “get current account balance, add/remove the amount of the transaction, and set the new account balance,” both transactions might start at the same time and get the same current account balance, apply their individual changes to the account value, but the application that is last to set the new account balance overwrites the other applications’s modification.
There are two ways to solve this problem using ActiveSpaces:
1.
An optimistic approach is best when the likelihood of having a collision is low. In this case, you should make use of the space’s update method, which is an atomic compare and set operation.
This operation takes two parameters, one representing the old data that was retrieved from the space, and another one representing the new version of that data. If the old data is still in the space at the time this operation is invoked, then the operation will succeed. However, if the data in the space was changed in any way, the operation will fail, which indicates that your application should refresh its view of the data and re-apply the change.
2.
A pessimistic approach to the concurrent update problem is best when there is a high likelihood of more than one process trying to update the same tuple at the same time. In this case, application programmers should first attempt to lock the tuple, and only apply their update to it after having obtained the lock. Locking is described in the following section.
Locking Data in a Space
ActiveSpaces allows users to lock records and keys in the space. The granularity of the locking in ActiveSpaces is a key, meaning that any possible key that could be used in the space can be locked, regardless of whether a tuple is actually stored in the space.
The space's lock function takes a tuple representing the key as an input parameter and can optionally return what is stored in the space at that key (if there is anything) just as a get operation allows you to lock tuples in the space. The space’s lock method is an atomic get and lock, and takes the same argument as the get method.
After a key is locked, it is read-only for all other members of the space except for either the process or the thread that issued the lock command. The lock's scope (the thread or the process) can be specified when the space's lock method is invoked.
If a thread or process other than the locking thread or process tries to do a put, take, lock, or any operation that would modify whatever is stored for the locked key, that operation may block until the lock is cleared.
A locked key is read-only for all space members except the member that has locked it. Only one member can lock a specific key at any given time. If a member other than the lock owner tries to overwrite, take, update, or lock a locked key, that operation may block until the lock is cleared. If you want to implement this behavior, set a lock wait value using the space's LockWait attribute.
After a key is locked, the owner of the lock can unlock it.
You can also iteratively lock all or a filtered subset of the tuples in a space by using a space browser.
Finally, you can specify a maximum time to leave for locks in a space: if a lock is held for longer than the value specified in the space's LockTTL attribute, it is then automatically cleared. Locks are also automatically cleared when the application that has created the lock leaves the metaspace or crashes.
Using Space Browsers
ActiveSpaces provides another method of interacting with spaces—space browsers. You can use space browsers when working with groups of tuples, rather than with the single tuple key lookup of the space’s get method. Space browsers allow you to iterate through a series of tuples by invoking the space browser’s next method. However, unlike a traditional iterator that works only on a snapshot of the data to be iterated through, the space browser is continuously updated according to the changes in the data contained in the space being browsed.
Changes happening to the data in the space are automatically reflected on the list of entries about to be browsed as they happen: a space browser never gives the user outdated information. For example, if an entry existed at the time the space browser was created, but it gets taken from the space before the space browser’s user gets to it, then this entry will not be returned by the space browser.
Space Browsers and the Event Browser
There are two main types of browser:
Space Browsers Allow your application to not only retrieve the next tuple in a series of tuples, but also to operate directly on the tuple. You can implement: three types of space browser:
Get Browser Retrieves the next tuple in a series of tuples.
Take Browser Retrieves the next tuple in a series of tuples and consumes it.
Lock Browser Retrieves the next tuple in a series of tuples and locks it.
Event Browsers Allow you to iterate through the stream of events (changes) occurring in the space.
Here are some additional differences between space browsers and event browsers:
Space browsers and event browsers both have two methods, next() and stop(). However, a space browser's next() method returns a SpaceEntry, while the event browser's next() method returns a SpaceEvent.
A space browser also has a getType() method, which the event browser does not have.
A space browser's next method will do a get, take, or lock, according to the browser's type: GetBrowser, TakeBrowser, or LockBrowser.
The Get Browser’s next() method does a get on the next tuple to browse (very much like a regular iterator).
The Take Browser’s next() method atomically retrieves and removes the next tuple currently available to take from the space.
The Lock Browser’s next() method atomically retrieves and locks the next tuple currently available to lock in the space).
The Event Browser’s next method returns a SpaceEvent rather than a tuple.
The SpaceEvent objects returned by the event browser’s next method optionally include the initial values, that is, what was in the space at the time the event browser was created.
The initial values are presented as a continuously updated string of PUT events preceding the stream of events that happen after the creation of the event browser. Event browsers allow you to see deletions and expirations of tuples they have already iterated through.
Space browsers deliver the tuples (and initial PUT events) for the initial values in no particular order, and the order might change from one instance of a space browser to another.
Since a space browser is continuously updated, it does not have a next() method; instead, it has a timeout: the amount of time the user is willing for the next call to block in the event that there is nothing to get, take, or lock at the time it is invoked (but there may be in the future).
Continuously updating tuples means that if multiple take browsers created on the same space are used to take tuples from the space using next, a particular tuple is only taken by one of the space browsers, effectively allowing the use of a space as a tuple queue.
Scopes of a Space Browser
A space browser can have either time scope or distribution scope, which are defined by setting the values of fields in the browser’s BrowserDef object:
Time Scope The time scope can be used to narrow the period of time of interest.
snapshot means that the browser starts with all the tuples in the space at the time the browser is created (or initial values), but is not updated with new tuples that are put into the space after that moment.
Note that the browser's timeout value is ignored when the time scope is snapshot, because in this case the browser will only iterate through a finite set of tuples (only those that are present in the space at the time of the browser's creation).
new means that the browser starts empty, and is updated only with tuples (or associated events) put into the space after the moment of the browser’s creation.
all means that the browser starts with all the tuples in the space, and is continuously updated with new tuples.
new_events is applicable only to event browsers, and means that the browser starts empty and is updated with all the events generated by the space after the moment of the browser's creation (unlike new, which would only deliver events associated with entries put in the space after the browser's creation time)
Distribution Scope The distribution scope can be used to narrow down the set of tuples or events being browsed.
all is used to browse over all the tuples (or associated events) in the space
seeded is used to browse only over the tuples (or associated events) actually distributed to the member creating the browser
Listening to Changes
ActiveSpaces can proactively notify applications of changes to the tuples stored in a space. Users can invoke the metaspace or space’s listen method to obtain a listener on spaces for receiving event notifications. There are five types of listeners:
1.
PutListener The PutListener’s onPut method is invoked whenever a SpaceEntry is inserted, updated, or overwritten in the space.
2.
TakeListener The PutListener’s onTake method is invoked whenever a SpaceEntry is removed from the space.
3.
ExpireListener The PutListener’s onExpire method is invoked whenever a SpaceEntry in the space has reached its time to live (TTL) and has expired.
4.
SeedListener The PutListener’s onSeed method is invoked whenever there is redistribution after an existing seeder leaves the space and now the local node is seeding additional entries. This is only applicable if the listener distribution scope is SEEDED.
5.
UnseedListener The PutListener’s onUnseed method is invoked whenever there is redistribution after a new seeder joins the space and now the local node stops seeding some of the entries. Only applicable if the listener distribution scope is SEEDED.
In the ActiveSpaces Java API, listeners must implement at least one of the listener interfaces shown above. Listeners are activated using the listen method of the Metaspace or Space class.
The PutListener interface requires an onPut(PutEvent event) method.
The TakeListener interface requires an onTake(TakeEvent event) method.
The ExpireListener interface requires anonExpire(ExpireEvent event) method.
The SeedListener interface requires an onSeed(SeedEvent event) method.
The UnseedListener interface requires an onUnseed(UnseedEvent event) method.
In the C API, you must call the tibasListener_Create function and specify a single callback function that is invoked for all event types. The new tibasListener object created by tibasListenerCreate is then activated using the tibasMetaspace_Listen or tibasSpace_Listen functions. The callback function is passed a tibasSpaceEvent object whose type can be determined by invoking the tibasSpaceEvent_GetType function.
ActiveSpaces generates space events of type:
TIBAS_EVENT_PUT when a tuple is inserted, overwritten, or updated.
TIBAS_EVENT_TAKE when a tuple is taken or removed.
TIBAS_EVENT_EXPIRE when a tuple reaches the end of its time to live and expires from the space.
TIBAS_EVENT_SEED when there is redistribution after a seeder joins or leaves, and the local node is seeding or unseeding. This is only applicable if the listener distribution scope is SEEDED.
TIBAS_EVENT_UNSEED when there is redistribution after a seeder joins or leaves, and the local node is seeding or unseeding. This is only applicable if the listener’s distribution scope is SEEDED.
You can also specify that a current snapshot of the entries stored in the space (sometimes referred to as initial values) is prepended to the stream of events. In this case, the initial values of all the tuples contained in the space at the listener’s creation time are seen as space events of type PUT preceding the current stream of events.
Filters
ActiveSpaces supports the application of filters to both listeners and browsers, as well as the ability to evaluate a tuple against a filter. Filters allow your application to further refine the set of tuples it wants to work with using a space browser or event listener.
A filter string can be seen as what would follow the where clause in a select * from Space where… statement.
Examples
   field1 < (field2+field3)
   state = "CA"
   name LIKE ".*John.*" //any name with John
 
Filters can make reference to any of the fields contained in the tuples. s do not provide any ordering or sorting of the entries stored in the space.
Operators Supported in Filters
Table 5 shows the operators that are supported in the ActiveSpaces filters:
only used with NULL or NOT NULL, as in “x IS NULL“ or “x IS NOT NULL"
nor, as in “age NOT 30 NOR 40“
Specifying a String Value in a Filter
If you specify a string value in a filter, then the filter value must be enclosed in double quotes; for example:
value = "Jones"
 
See ASQuery (Java Only) for examples of filter queries that utilize strings enclosed within double quotes.
Regex Syntax for Filter Values
Table 5 indicates several filter formats for regular expressions (regex values). For regular expressions, ActiveSpaces uses the syntax for Perl Compatible Regular Expressions (PCRE).
For general information on PCRE, see the PCRE website at the following URL:
http://www.pcre.org/
For detailed documentation on PCRE, see the text version of the man pages for PCRE at the following URL:
http://www.pcre.org/pcre.txt
Formats for Filter Values
Table 6 shows the formats for values used in filters.
 
Remotely Invoking Code over a Space
ActiveSpaces allows space members to remotely invoke code on other members of the space. This feature allows the code to be co-located with the data for optimal performance.
Execution of the Invocable interface is triggered by an application that can be running on the same member or a different member of the space. In ActiveSpaces, this is referred to as remote invocation.
The invocable code is executed either on the member that contains specified data (if the Invocable interface is used) or on specified members (if the MemberInvocable interface is used). If you use the MemberInvocable interface, your application specifies which members should execute the interface.
Compare the two approaches to updating the value of a field on all of the entries stored in a space.
One approach is to create a browser of distribution scope all on the node to serially retrieve and update each entry in the space one entry at a time.
This represents a non-distributed process, as a single node is actually doing the updating. It incurs a fair amount of network traffic, since retrieving an entry might require a round-trip over the network, and updating that entry might require another round-trip. The latency induced by those network round-trips has a negative impact on the overall throughput.
For each entry, the invoked function updates the field and the entry the same way as described for the non-distributed process, with the difference that the entry updates will be performed much faster since they do not incur a network round-trip.
Remote space invocation is available for all language bindings. It only takes care of function or method invocation and does not take care of distributing the code to the space members; for example, the function or method being invoked must be available in the CLASSPATH of all space members.
Invocation Patterns
public interface Invocable
With the Invocable interface, the application indicates the key of an entry stored in the space. ActiveSpaces determines which space member stores the element associated with the key or which space member would be used to store the element, if the element does not exist in the space. Execution of the Invocable interface will then occur on that space member.
The code implementing the Invocable interface needs to be included in the CLASSPATH for each member of the space.
The following remote space invocation services are available:
invoke   Invokes a method only on the member seeding the key passed as an argument to the call.
invokeMember   Invokes a method only on the Space member being specified as an argument to the call.
invokeMembers   Invokes a method on all of the Space members.
InvokeSeeders   Invokes a method on all of the seeder members of the Space.
All of those calls also take as arguments:
The invoke method takes a key tuple, which gets passed to the method implementing the Invocable (rather than MemberInvocable) interface in the class; the method gets invoked regardless whether an entry has been stored in the space at that key.
Both the Invocable and the MemberInvocable interfaces return a tuple, but the remote space invocation methods return either an InvokeResult (invoke and invokeMember) or an InvokeResultList (invokeMembers and invokeSeeders), from which the Tuple can be retrieved using the getResult (or getResults) method.
Transactions
ActiveSpaces Enterprise Edition allows you to atomically perform sets of space operations using transactions. Transactions can span multiple spaces, but not multiple metaspaces. A transaction starts when an individual thread in an application creates a transaction, and terminates when either commit or rollback is invoked, at which point all space operations performed by that thread are either validated or canceled. Pending transactions may be rolled back automatically if they exceed an optional TTL (time-to-live) threshold, or when the member creating them leaves the metaspace.
Transactions can also be moved from one thread to another using the releaseTransaction() and takeTransaction() methods.
In ActiveSpaces 2.0, the only supported read isolation level is READ_COMMITTED. This isolation level applies only to your view of data modified by the transactions of other applications and threads. This means that whether in a transaction or not, you will not see uncommitted data from other transactions, but if you yourself are in a transaction you will see your own uncommitted data.
ActiveSpaces has an implied write isolation level of UNCOMMITTED, meaning that any entry potentially modified by a pending transactional operation appears to be locked for other users of the space (in which case the space’s LockWait attribute will apply).
Deployment
ActiveSpaces is a peer-to-peer distributed in-memory tuple space. This means that the tuples are stored in the memory of a cluster of machines working together to offer the storage of tuples. There is no central server used for the coordination of operations, but rather any number of peers working together to offer a common service.
To store tuples, ActiveSpaces uses a distributed hashing algorithm applied on the values of the key fields of the tuple to distribute the seeding of the tuples as evenly as possible (that is, their storing and management) over a set of peers. This means that:
Given the current set of seeders (any process joined to a particular space as a seeder), any participating member of the space knows where to find the tuple associated with a particular set of key field values.
By specifying its role as a seeder, a process indicates its willingness to lend some of its resources—memory, CPU, and network resources—to the storing of tuples in the space. This is the means by which a space is scaled up. ActiveSpaces also allows applications to use spaces as leeches, which means that, while retaining full access to the service provided by the seeders, the application is not willing to lend any of its resources to the storing of tuples in the space. Adding or removing seeders from a space can incur performance costs by necessitating redistribution of the entries in the space, while leeches can join and leave spaces without impacting performance.
Before being able to join spaces, applications must first connect to a metaspace, which is an administrative domain in which users can create and use any number of spaces, but which also represents the cluster of machines and applications being able to communicate with each other.
Networking Considerations
Applications can connect to the metaspace either as full peers to the other peers of the metaspace, at which point they will need to be able to establish and receive TCP connections from all the other full peers of the metaspace (regardless of their role in individual spaces), or as 'remote clients' that connect to the metaspace through establishing a single TCP connection to a proxying ActiveSpaces agent process (itself a fully connected peers). Fully connected peers will always experience lower latency of execution of the space operations than remote clients, and remote clients will always be limited to join spaces as leeches (rather than be able to join spaces a seeders).
Before establishing TCP connections to each other, the full peers of a metaspace need to 'discover' each other. Discovery can be done by using a reliable multicast protocol (either the built-in PGM protocol stack, or, optionally using the TIBCO Rendezvous messaging system) or directly with TCP by listing a set of well known IP addresses and ports. From a configuration standpoint, the easiest option is to use the default built-in PGM reliable multicast protocol, but this assumes that all of the full peers of the metaspace are able to exchange multicast packets with each other over the network.
In this default deployment scenario, metaspace members must be able to both receive each other's multicast transmissions and establish TCP connections to each other. To enable this, firewall settings on the host may have to be adjusted to allow sending and reception of UDP multicast packets on the port specified through the multicast URL used to connect to the metaspace, and to allow incoming and outgoing TCP connections to the ports specified in the listen URL used by the members of the metaspace to connect to it.
Also, if the host has multiple network interfaces, care must be taken to ensure that the member binds its multicast and listen transports to the appropriate interface, that is, to the network that can be used to send or receive UDP multicast packets and establish or accept TCP connections with the other members of the metaspace. The interface to use for each transport can be specified in the associated URL used to connect to the metaspace. If no interface is specified, the ActiveSpaces transport libraries will default to using the default interface for the host such as the interface pointed to by 'hostname').
For more information see Joining a Space or Metaspace: Special Considerations.
Joining a Space or Metaspace: Special Considerations
A single process can connect to a given metaspace or join a given space only once:
A single application can only connect once to the same metaspace. In other words, you cannot invoke connect on the same metaspace twice and have two connections to the same metaspace from the same process. However, you can connect simultaneously to several different metaspaces, and it is possible to get a copy of a currently connected metaspace object by using the ASCommon object's methods.
When a process joins a space through its metaspace connection, it will only join a space once. If you call getSpace twice, the process will join the space the first time you call it, but the second getSpace call will return to you a new reference to the previously created space object. If you specify a different role the second time you call getSpace, then it will adjust your role on that space, but this does not mean that you have joined the same space twice.
The space object is reference-counted and the space will actually be left by the process only when the space object's leave method is invoked an equal number of times to the Metaspace's getSpace method for that particular space.
The role on the space (seeder or leech) is also automatically adjusted when leave methods are being invoked on those space objects.

Copyright © TIBCO Software Inc. All Rights Reserved
Copyright © TIBCO Software Inc. All Rights Reserved