Administrative Concepts

These concepts and definitions pave the way to a more detailed understanding of ActiveSpaces administration.

Data Grid

A set of cooperating processes that distribute data across a set of host computers.

Three kinds of cooperating processes implement a data grid: nodes, proxies, and state keepers.

Copyset

A data grid partitions the complete set of data into copysets. Each copyset contains a portion of the full data set.

Each table row resides within only one copyset.

Partitioning

The data grid horizontally partitions the rows of a table across copysets. So a query or a transaction could span many copysets.

Node

Nodes are processes that implement a copyset. Administrators define nodes and assign them to copysets.

Each copyset requires a primary node. Secondary nodes can provide optional backup protection.

Each node of a copyset maintains one copy of the data (that is, one copy of all the rows in that copyset).

Each node is part of only one copyset.

Replica
The number of replicas in a copyset is identical to the number of nodes that implement that copyset. Replicas provide fault tolerance and protect data against hardware failures. More replicas yield greater protection.
  • In a prototyping or testing environment, you can implement a copyset using only one node.
  • In most production environments, two nodes provide adequate protection.
  • For even stronger fault tolerance, you can use three nodes.
Replication

The replication feature, when used, provides fault tolerance by preventing data loss when a node (or the machine running the node) fails and cannot be accessed.

All nodes in a given copyset are replicas of each other and they all have the exact same set of data.

There is a single primary replica in a copyset and the other nodes in that copyset are secondary replicas.

Every copyset in the data grid is organized to make sure the slice of data owned by that copyset is stored on as many replicas as desired.

Reconciling Nodes of a Copyset

When a node of a copyset is brought back online, the data for the node is reconciled with the primary node. After reconciliation, the node being brought back online resumes as a secondary node of the copyset.

For more information, see Copysets.

Proxy
Proxies are processes that mediate data grid operations on behalf of application programs.

Application programs connect to proxies, which in turn connect to nodes.

Proxy processes are independent of one another and do not require persistent state, so you can share the load of operations among multiple proxies.

State Keeper

Fault-tolerant state keeper processes determine and record the crucial internal governing information by which a data grid operates, and supply this information to the proxies and copyset nodes.

A set of fault-tolerant state keeper processes protect this crucial information and ensure nonstop access to it. One of the state keepers is designated the primary state keeper and supplies this information to the proxies and copyset nodes. If the lead state keeper goes down, one of the secondary state keepers takes over as the primary. In a fault-tolerant set of 3 state keepers, a quorum of 2 state keepers must always be running to ensure data consistency in split brain scenarios. If a state keeper is restarted while a quorum is running, one of the running state keepers updates the state of the restarted state keeper. If the number of running state keepers falls below the quorum and there is a change in the state of the copyset (for example, a node goes down), operations on the data grid fail.

For more information, see State Keeper.

Using Multiple Nodes
There are severall reasons for using multiple nodes:
  • Nodes in different copysets are created with the goal of scaling horizontally.

    As a result multiple copysets are created, each with a slice of the data.

  • Nodes in the same copyset are created to provide multiple replicas for fault tolerance.

    These contain identical copies of the data.

  • In a product environment a combination of the previously described use cases can be used.

    For example, you might choose to have two replicas per copyset and multiple copysets (say three) to scale horizontally.

    In this example, your environment would have a total of six nodes.