NoSQL Container Metrics (Cassandra)
NoSQL Compaction Stats
NoSql GC Stats
NoSql Tokens KeySpace Info
NoSql Node Info
1. Compaction Information { "compaction.throughput": "16MB/s", "compaction.pending_task": "0" }
Name | Data Field | Unit | Data Type | Notes | Details |
---|---|---|---|---|---|
Compaction Throughput | compaction.throughput | MB/Second | String | Default value is 16MB/s | Defines the rate at which consolidate SSTables are written, and in turn helps to mitigate the tendency of small SSTables to accumulate during compaction. For more information, see https://docs.datastax.com/en/archived/cassandra/2.1/cassandra/operations/ops_configure_compaction_t.html. |
Compaction Pending Tasks | compaction.pending_task | N/A | String | It should be 0. | Indicates the estimation of the work to be done. If it continues to increase, then it indicates that the compaction strategy is not working, or we are low on our disk space. For more information, see thelastpickle.com/blog/2017/03/16/compaction-nuance.html. |
{ "gc.interval": 301998, "gc.max_elapsed": 23, "gc.total_elapsed": 23, "gc.stddev_elapsed": 0, "gc.reclaimed": 83365208, "gc.collections": 1, "gc.memory_bytes": -1 }
Name | Data Field | Unit | Data Type | Details |
---|---|---|---|---|
GC Interval | gc.interval | Milliseconds | Number | Interval between last status check and current status check. |
GC Max Time Elapsed | gc.max_elapsed | Milliseconds | Number | Maximum time taken by Garbage collection task in this interval. |
GC Total Time Elapsed | gc.total_elapsed | Milliseconds | Number | Total time taken by all garbage collection tasks in this interval. |
GC Stdddev Elapsed | gc.stddev_elapsed | Milliseconds | Number | Standard deviation in Milliseconds. |
GC Reclaimed Memory | gc.reclaimed | N/A | Number | Memory reclaimed in this interval. |
GC Collection Count | gc.collections | Bytes | Number | Number of GC tasks performed in this interval. |
{ "time": {epoch_time}, "message": { "tag": "tml-nosql.{HOSTNAME}.metrics.cassandra.keyspaceinfo", "ingestion_time": "{Time of ingestion to log service}", "oauth2.atokens.space.used.total": "0", "oauth2.atokens.space.used.live": "0", "oauth2.rtokens.space.used.total": "0", "oauth2.rtokens.space.used.live": "0", "oauth2.atokens.read.latency": "NaN", "oauth2.atokens.write.latency": "NaN", "oauth2.rtokens.read.latency": "NaN", "oauth2.rtokens.write.latency": "NaN", "oauth2.atokens.estimatedpartition.count": 0, "oauth2.rtokens.estimatedpartition.count": 0, "oauth2.atokens.dropped.mutation": "0", "oauth2.rtokens.dropped.mutation": "0" } }
Name | Data Field | Unit | Data Type | Notes | Details |
---|---|---|---|---|---|
Oauth2.atokens key space | oauth2.atokens.space.used.total | Bytes | String | Represents the size of the column family(table) "atokens" in "Oauth2" key space including total number of bytes of disk space used by SSTables belonging to this table, including obsolete SSTables waiting to be GCd. | |
Oauth2.atokens key space | oauth2.atokens.space.used.live | Bytes | String | Represents the size of the column family(table) "atokens" in "Oauth2" key space having only total number of bytes of disk space used by all active SSTables belonging to this table. | |
Oauth2.rtokens key space | oauth2.rtokens.space.used.total | Bytes | String | Represents the size of the column family(table) "rtokens" in "Oauth2" key space including total number of bytes of disk space used by SSTables belonging to this table, including obsolete SSTables waiting to be GCd. | |
Oauth2.rtokens key space | oauth2.rtokens.space.used.live | Bytes | String | Represents the size of the column family(table) "rtokens" in "Oauth2" key space having only total number of bytes of disk space used by all active SSTables belonging to this table. | |
Oauth2.atokens Read Latency | oauth2.atokens.read.latency | Milliseconds | String | Round trip time in milliseconds to complete the most recent request to read the table | |
Oauth2.rtokens Read Latency | oauth2.rtokens.read.latency | Milliseconds | String | Round trip time in milliseconds to complete the most recent request to read the table | |
Oauth2.atokens Write Latency | oauth2.atokens.write.latency | Milliseconds | String | Round trip time in milliseconds to complete an update to the table. This should be minimal since a write need only be recorded in memory and appended to a durable commit log before it is acknowledged as a success. | |
Oauth2.rtokens Write Latency | oauth2.rtokens.write.latency | Milliseconds | String | Round trip time in milliseconds to complete an update to the table. This could vary depending upon frequent updates happening on the same data. This would delay read and may see read latency. | |
Oauth2.atokens Partition Count | oauth2.atokens.estimatedpartition.count | N/A | Number | The number of partition keys for this table. This gives the estimated number of partitions in the table. | |
Oauth2.rtokens Partition Count | oauth2.rtokens.estimatedpartition.count | N/A | Number | The number of partition keys for this table. This gives the estimated number of partitions in the table. | |
Oauth2.atokens Dropped Mutation | oauth2.atokens.dropped.mutation | N/A | String | This value should be 0. | The number of mutations (INSERTs, UPDATEs or DELETEs) started on this table but not completed. A large number of dropped mutations means that node is overloaded. |
Oauth2.rtokens Dropped Mutation | oauth2.rtokens.dropped.mutation | N/A | String | This value should be 0. | The number of mutations (INSERTs, UPDATEs or DELETEs) started on this table but not completed. A large number of dropped mutations means that node is overloaded. |
{ "time": {epoch_time}, "message": { "tag": "tml-nosql.{HOSTNAME}.metrics.cassandra.nodeinfo", "ingestion_time": "{Ingestion_time_to_log_service}", "node.datacentre": "dc1", "node.key_cache_rate": 0.925, "node.exception_count": 0, "node.load": "218.98KiB", "node.percentage_repaired": "100.0%", "node.heap_used": 135.25, "node.heap_total": 502.0 } }
Name | Data Field | Unit | Data Type | Details |
---|---|---|---|---|
Node Key Cache Rate | node.key_cache_rate | % | Number | Fraction of read requests for which the key's location on disk was found in the cache. The key cache hit rate provides visibility into the effectiveness of key cache. If the key cache hit rate is consistently high (above 0.85, or 85 percent), then the vast majority of read requests are being expedited through caching. If the key cache is not consistently serving up row locations for your read requests, consider increasing the size of the cache, which can be a low-overhead tactic for improving read latency. |
Node Info Exception Count | node.exception_count | N/A | Number | Number exception thrown by Cassandra while reading/writing request. This number should be low or near to 0. If the exceptions count is increasing then type of exception needs to be find out. For instance, Cassandra's timeout exception reflects the incomplete (but not failed) handling of a request. Timeouts occur when the coordinator node sends a request to a replica and does not receive a response within the configurable timeout window. Timeouts are not necessarily fatal-the coordinator will store the update and attempt to apply it later-but they can indicate network issues or even disks nearing capacity. A more worrisome exception type is the unavailable exception, which indicates that Cassandra was unable to meet the consistency requirements for a given request, usually because one or more nodes were reported as down when the request arrived. For instance, in a cluster with replication factor of three and a consistency level of ALL, a read or write request will need to reach all three replica nodes in the cluster to perform a successful read or write. |
Node Load | node.load | KIB/MiB | String | Disk space used on a node, in bytes. This number should always be less than total size allocated to Cassandra node. |
Node Percent Repair | node.percentage_repaired | % | String | Since Cassandra fully embraces eventually consistency repair is actually a important mechanism for making sure copies of data are shipped around the cluster to meet your specified replication factor. If it is less then 50%, then it means that 50% of data is not replicated properly and hence we may end up losing the data. |
Node Heap Usage | node.heap_used | N/A | Number | Total heap memory consumed by Cassandra Node. |
Node Total Heap Allocation | node.heap_total | N/A | Number | Total heap memory allocated to Cassandra Node. |
Copyright © Cloud Software Group, Inc. All rights reserved.