NoSQL Container Metrics (Cassandra)

NoSQL Compaction Stats
1. Compaction Information
  "compaction.throughput": "16MB/s",
  "compaction.pending_task": "0"
Name Data Field Unit Data Type Notes Details
Compaction Throughput compaction.throughput MB/Second String Default value is 16MB/s Defines the rate at which consolidated SSTables are written, and in turn helps to mitigate the tendency of small SSTables to accumulate during compaction. For more information, see
Compaction Pending Tasks compaction.pending_task N/A String It should be 0. Indicates the estimation of the work to be done. If it continues to increase, it indicates that the compaction strategy is not working, or disk space is low. For more information, see
NoSql GC Stats
    "gc.interval": 301998,
    "gc.max_elapsed": 23,
    "gc.total_elapsed": 23,
    "gc.stddev_elapsed": 0,
    "gc.reclaimed": 83365208,
    "gc.collections": 1,
    "gc.memory_bytes": -1
Name Data Field Unit Data Type Details
GC Interval gc.interval Milliseconds Number Interval between last status check and current status check.
GC Max Time Elapsed gc.max_elapsed Milliseconds Number Maximum time taken by Garbage collection task in this interval.
GC Total Time Elapsed gc.total_elapsed Milliseconds Number Total time taken by all garbage collection tasks in this interval.
GC Stdddev Elapsed gc.stddev_elapsed Milliseconds Number Standard deviation in Milliseconds.
GC Reclaimed Memory gc.reclaimed N/A Number Memory reclaimed in this interval.
GC Collection Count gc.collections Bytes Number Number of GC tasks performed in this interval.
NoSql Tokens KeySpace Info
  "time": {epoch_time},
  "message": {
    "tag": "tml-nosql.{HOSTNAME}.metrics.cassandra.keyspaceinfo",
    "ingestion_time": "{Time of ingestion to log service}",
    "": "0",
    "": "0",
    "": "0",
    "": "0",
    "": "NaN",
    "oauth2.atokens.write.latency": "NaN",
    "": "NaN",
    "oauth2.rtokens.write.latency": "NaN",
    "oauth2.atokens.estimatedpartition.count": 0,
    "oauth2.rtokens.estimatedpartition.count": 0,
    "oauth2.atokens.dropped.mutation": "0",
    "oauth2.rtokens.dropped.mutation": "0"
Name Data Field Unit Data Type Notes Details
Oauth2.atokens key space Bytes String Represents the size of the column family(table) "atokens" in "Oauth2" key space including total number of bytes of disk space used by SSTables belonging to this table, including obsolete SSTables waiting to be GCd.
Oauth2.atokens key space Bytes String Represents the size of the column family(table) "atokens" in "Oauth2" key space having only total number of bytes of disk space used by all active SSTables belonging to this table.
Oauth2.rtokens key space Bytes String Represents the size of the column family(table) "rtokens" in "Oauth2" key space including total number of bytes of disk space used by SSTables belonging to this table, including obsolete SSTables waiting to be GCd.
Oauth2.rtokens key space Bytes String Represents the size of the column family(table) "rtokens" in "Oauth2" key space having only total number of bytes of disk space used by all active SSTables belonging to this table.
Oauth2.atokens Read Latency Milliseconds String Round trip time in milliseconds to complete the most recent request to read the table
Oauth2.rtokens Read Latency Milliseconds String Round trip time in milliseconds to complete the most recent request to read the table
Oauth2.atokens Write Latency oauth2.atokens.write.latency Milliseconds String Round trip time in milliseconds to complete an update to the table. This should be minimal since a write need only be recorded in memory and appended to a durable commit log before it is acknowledged as a success.
Oauth2.rtokens Write Latency oauth2.rtokens.write.latency Milliseconds String Round trip time in milliseconds to complete an update to the table. This could vary depending upon frequent updates happening on the same data. This would delay read and may see read latency.
Oauth2.atokens Partition Count oauth2.atokens.estimatedpartition.count N/A Number The number of partition keys for this table. This gives the estimated number of partitions in the table.
Oauth2.rtokens Partition Count oauth2.rtokens.estimatedpartition.count N/A Number The number of partition keys for this table. This gives the estimated number of partitions in the table.
Oauth2.atokens Dropped Mutation oauth2.atokens.dropped.mutation N/A String This value should be 0. The number of mutations (INSERTs, UPDATEs or DELETEs) started on this table but not completed. A large number of dropped mutations means that node is overloaded.
Oauth2.rtokens Dropped Mutation oauth2.rtokens.dropped.mutation N/A String This value should be 0. The number of mutations (INSERTs, UPDATEs or DELETEs) started on this table but not completed. A large number of dropped mutations means that node is overloaded.
NoSql Node Info
  "time": {epoch_time},
  "message": {
    "tag": "tml-nosql.{HOSTNAME}.metrics.cassandra.nodeinfo",
    "ingestion_time": "{Ingestion_time_to_log_service}",
    "node.datacentre": "dc1",
    "node.key_cache_rate": 0.925,
    "node.exception_count": 0,
    "node.load": "218.98KiB",
    "node.percentage_repaired": "100.0%",
    "node.heap_used": 135.25,
    "node.heap_total": 502.0
Name Data Field Unit Data Type Details
Node Key Cache Rate node.key_cache_rate % Number Fraction of read requests for which the key's location on disk was found in the cache. The key cache hit rate provides visibility into the effectiveness of key cache. If the key cache hit rate is consistently high (above 0.85, or 85 percent), then the vast majority of read requests are being expedited through caching. If the key cache is not consistently serving up row locations for your read requests, consider increasing the size of the cache, which can be a low-overhead tactic for improving read latency.
Node Info Exception Count node.exception_count N/A Number Number exception thrown by Cassandra while reading/writing request. This number should be low or near to 0. If the exceptions count is increasing then type of exception needs to be find out. For instance, Cassandra's timeout exception reflects the incomplete (but not failed) handling of a request. Timeouts occur when the coordinator node sends a request to a replica and does not receive a response within the configurable timeout window. Timeouts are not necessarily fatal-the coordinator will store the update and attempt to apply it later-but they can indicate network issues or even disks nearing capacity. A more worrisome exception type is the unavailable exception, which indicates that Cassandra was unable to meet the consistency requirements for a given request, usually because one or more nodes were reported as down when the request arrived. For instance, in a cluster with replication factor of three and a consistency level of ALL, a read or write request will need to reach all three replica nodes in the cluster to perform a successful read or write.
Node Load node.load KIB/MiB String Disk space used on a node, in bytes. This number should always be less than total size allocated to Cassandra node.
Node Percent Repair node.percentage_repaired % String Since Cassandra fully embraces eventually consistency repair is actually a important mechanism for making sure copies of data are shipped around the cluster to meet your specified replication factor. If it is less then 50%, then it means that 50% of data is not replicated properly and hence we may end up losing the data.
Node Heap Usage node.heap_used N/A Number Total heap memory consumed by Cassandra Node.
Node Total Heap Allocation node.heap_total N/A Number Total heap memory allocated to Cassandra Node.