Sizing Guide
Usually, the total data set is partitioned horizontally into copysets where each copyset holds a fraction of the data. Since a copyset in production typically includes more than one node for redundancy (where each node is an exact replica of the data in that copyset), let us start with a simplifying assumption that the data resides on a single node per copyset.
The size of a copyset is determined by the following factors:
- The number of rows
- The size of a row in bytes (The size of a row is determined by number of columns, the column data types, and the actual values placed in each column)
- Indexes
Subtopics