define | create space

Used to create a space.

Syntax

define | create space <string>
(field name <string> type <string> [nullable <boolean>] (, field name <string> type <string> [nullable <boolean>] [encrypted <boolean>)*)
     key ( [type <string>] fields (<string> (, <string>)*))
     distribution_def ('KEY','field0','field1','field2' ...)
     (index ( name <string> [type <string>] fields (<string>
     (, <string>)*)))*
[distribution_fields (<string> (, <string>)* )
[distribution_policy <string>]
[replication_count <integer>]  [replication_policy <string>]
[host_aware_replication <boolean>]
[persistence_type <string>]  [persistence_policy <string>]
[file_sync_interval <long>]
[cache_policy <string>]
[min_seeders <integer>]
[capacity <long>]  [eviction_policy <string>]
[ttl <long>]  [lock_ttl <long>]
[lock_wait <long>]  [lock_scope <string>]
[space_wait <long>]
[write_timeout <long>]  [read_timeout <long>]
[query_timeout <long>]  [query_limit <long>]
[forget_old_value <boolean>]
[virtual_node_count <integer>]
[phase_count <integer>]  [phase_interval <long>]
[phase_ratio <integer>]
[routed <boolean>]

Remarks

The supported data types for fields are:

Field types
boolean, char, short, integer, long, float, double, blob, string, datetime
Distribution policies
non_distributed, distributed
Persistence types
none, share_all, share_nothing
Persistence policies
sync, async
Replication policies
sync, async
Eviction policies
none, lru
Lock scopes
thread, process

Parameters

The parameters for this command are listed and described in define create space Parameters.

define | create space Parameters
Parameter Description
space Required. Specify the name of the space that is to be created.
field Required.

The data type for a field must be one of the following: boolean, char, short, integer, long, float, double, string, datetime, blob.

nullable Optional. Can be either true or false (no quotes). By default is equal to false. If a field has nullable set to true, tuples put into the space do not need to contain a field with that name.
encrypted Optional. If the field is not a key field or an index field and you have enabled ActiveSpaces security, you can specify that the data in the field is encrypted. Each (non-key, non-index) field can be made encrypted, as long as the policy for the corresponding domain allows it.
key Required. Identifies one or more fields (already specified with the field parameter) that will serve as a unique key for the space.

When you enter the key parameter, you can optionally specify the index type of the key field by including the type keyword. For example:

key (type "hash" fields (...))

The fields keyword is required.

The type keyword is optional. The default index type is the “hash” index type.

index Optional. Identifies one or more fields already specified with the “field” parameter that will serve as a secondary index. You can specify an index name and index type to be used by entering:

index (name "index1" type "tree" fields (...))

The name keyword is required.

The type keyword is optional. The default is index type is “tree.”

The fields keyword and fields are required

You can specify as many indexes as desired by specifying the indexes, one by one, after the key parameter. You just need to put them one after other after the key field. For example:

key (...) index(name "index1" ...) index(name "index2" ...) index (name "index3" ...)

distribution_fields Defines one or more fields as distribution fields. If a field is defined as a distribution field, then all tuples that have an identical data value for the field are stored on the same seeder.
Attention: The distribution fields must be a subset of the key fields. Otherwise, ActiveSpaces throws an exception.

The following example shows how to set up distribution fields:

key (fields ('KEY','field0','field1','field2')) distribution_fields ('KEY','field0','field1','field2') distribution_policy 'distributed' replication_count 0

In the example, field0, field1, and field2 are defined as key fields and also as distribution fields.

Note: If you define fields as distribution fields, then you must also set distribution_policy to distributed.
distribution_policy Optional. Determines whether management of entries in the space is shared among the seeders that have joined the space (distributed) or a single seeder is responsible for all entries in the space (non_distributed). The default value is distributed.
Note: If you define fields as distribution def fields, then you must also set distribution_policy to distributed.
replication_count Optional. An integer that specifies the number of times each entry should be replicated on different seeders (default: 0).
replication_policy Optional. A value of sync specifies that replication is done in synchronous mode for the space, before the operation returns. When an operation modifies one of the entries in the space, the operation only returns an indication of success when that modification has been positively replicated up to the degree of replication required for the space. A value of async specifies that replication is asynchronous.
host_aware_replication Optional. A boolean indicating whether host aware replication is enabled or disabled. Host aware replication ensures that data is not replicated on any seeders specified to be in the same seeder group (default: true, host aware replication is enabled).
persistence_type Optional. Specifies whether persistence is enabled for the space, and if so, what type of persistence to use.

To specify no persistence, specify none. To specify shared all persistence (space members designated as persisters maintain data on disk), specify share-all. To specify shared-nothing persistence (each member maintains data on disk), specify share_nothing.

persistence_policy Optional. Specifies the what type of communication is used to maintain persistence: synchronous (sync) or asynchronous (async).
file_sync_interval A long integer that indicates the amount of time (in milliseconds) to wait between persists to the data store when asynchronous, versus shared-nothing persistence is used (default: 10000).
min_seeders Optional. Specifies the minimum number of seeders that should be joined to the space before the space becomes ready to accept operations. The default value is 1.
capacity Optional. Specifies a maximum number of entries per seeder for the space. When the capacity is reached the result of any additional request to put (insert) a new entry in the space will depend on the value of the eviction_policy attribute. The default value is -1 (no capacity).
eviction_policy Optional. If a put operation on a space would cause a seeder to exceed the space's capacity attribute, then the value of this attribute will dictate the result of this operation: if the value is 'none' (in quotes) then there will be no eviction and the operation will fail because the seeder is already at capacity. If the value is 'lru' (in quotes) then the seeder will evict another entry from the space using the 'least recently used' eviction algorithm. The default value is 'none' (no eviction).
ttl Optional. Time to live in milliseconds. The default is -1 (forever).
lock_ttl Optional. Specifies in milliseconds the duration of a lock placed on the space. The default is -1 (forever).
lock_wait Optional. For a space that is locked, specifies how long a member process will wait for it to become unlocked. The default is 0. Other accepted values are only positive values. The unit of measure is milliseconds.
lock_scope Optional. Specifies the lock scope to be used for each operation that includes locking.

You can specify the following:

  • thread  The lock applies to the current thread only.
  • process  The lock applies to the entire application.

    The default value is thread.

space_wait Specifies the space wait for the specified space.

The space wait value is a timeout that applies to operations that cannot be processed because the space is not in the READY state, i.e., the space in the INITIAL, LOADING, RECOVER, or SUSPEND state.

write_timeout A long integer indicating the amount of time (in milliseconds) a write operation on the space can be blocked from completing the write before failing (default: 60000.)
query_timeout A long integer indicating the amount of time (in milliseconds) a query on the space can take to return its results (default: -1, take forever).
query_limit
A long integer specifying the maximum number of entries to be returned by a query on this space (default 10000). The query_limit setting is intended to help you prevent large queries from exhausting the system's memory. By default, all browsers and listeners use the query_limit defined for the space they are browsing or listening on.
Note: When there are multiple seeders for a space, the query limit is divided evenly amongst the seeders of the space so that an even number of resulting entries is taken from each seeder. When the number of entries in the result of a query approaches the query limit, it is possible that some seeders might return fewer entries than others. This is because the algorithm ActiveSpaces uses to distribute entries amongst seeders does not guarantee the entries will be evenly distributed. You should adjust the query_limit setting to accommodate the largest number of entries you will allow queries to return. When doing so, keep in mind that the query_limit should be slightly larger than the intended amount to account for the uneven distribution of entries amongst seeders.
read_timeout A long integer indicating the amount of time (in milliseconds) a read operation on the space can be blocked from completing the read before failing. (default: 60000).
forget_old_value A boolean value indicating whether operations that update entries in the space return any existing values for the entries (default: false, values are returned).
virtual_node_count An integer that represents the capacity of each seeder for consistent hashing. (default: 1000).
phase_count An integer that indicates the number of phases to use for redistributing data within a space when a seeder leaves a space and its entries are redistributed amongst the remaining seeders of the space.(default: -1, space operations block until all of the entries are redistributed).
phase_interval A value that specifies a phase interval, in milliseconds. The phase interval specifies the duration of each processing phase. The default value is 200 milliseconds.

Phase interval is used in conjunction with the phase_count parameter. For example, to specify that there will be 10 phases and each phase duration is 200 milliseconds, you would enter:

create space name "test" (field name "k" type "blob") key(fields("k")) phase_count 200 phase_interval 10

phase_ratio An integer specifying the percentage of the redistribution time that has elapsed to wait until the next phase. The longer it takes for redistribution to occur, the longer the breaks between each phase. (default: 100)
routed A boolean value indicating whether updates to the space can be routed between sites (default: false).

Examples

Simple example
define space 'myspace' (field name 'key' type 'integer' 
field name 'value' type 'string') key (fields ('key'))
With additional parameters
define space 'testspace' (field name 'key' type 'double' 
field name 'value' type 'blob') key (fields ('key')) 
distribution_policy 'non_distributed' replication_count 1 replicated_policy ’sync’ 
capacity 10000 eviction_policy 'none' persistence_type ’share_nothing’ 
persistence_policy ’sync’
With distribution key parameters
define space 'usertable3' (field name 'KEY' type 'string', 
field name 'field0' type 'string', field name 'field1' type 'string',
field name 'field2' type 'string',field name 'field3' type 'string',
field name 'field4' type 'string', field name 'field5' type 'string', 
field name 'field6' type 'string', field name 'field7' type 'string', 
field name 'field8' type 'string', field name 'field9' type 'string') 
key (fields ('KEY','field0','field1','field2')) 
distribution_def ('KEY','field0','field1','field2') 
distribution_policy 'distributed' replication_count 0