Troubleshooting OAuth Token Migration Issues
Cassandra Write Timeout Issue
While running the Migration utility in Cloud storage (for example, AWS or GCP), after writing a number of tokens in Cassandra, the utility may sometimes give the following error and fail.
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency LOCAL_QUORUM (2 replica were required but only 1 acknowledged the write) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:88) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:66) at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:297) at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:268) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88) ... 25 common frames omitted
This error is caused in Cloud cluster setup, where the write operation was taking more time than the write timeout value defined in cassandra.yaml (write_request_timeout_in_ms: default value 2 seconds).
Cassandra Commands to Check Health of Cassandra Cluster
Check
atoken table statistics to check write status:
[root@cass-set-0-0 builder]# nodetool tablestats -H oauth2.atokens; Total number of tables: 57 ---------------- Keyspace : oauth2 Read Count: 0 Read Latency: NaN ms Write Count: 3954158 Write Latency: 0.06952324439235862 ms Pending Flushes: 0 Table: atokens SSTable count: 4 Space used (live): 83.21 MiB Space used (total): 83.21 MiB Off heap memory used (total): 0 bytes SSTable Compression Ratio: -1.0 Number of partitions (estimate): 112 Memtable cell count: 143 Memtable data size: 31.12 KiB Memtable off heap memory used: 0 bytes Memtable switch count: 41 Local read count: 0 Local read latency: NaN ms Local write count: 242672 Local write latency: NaN ms Pending flushes: 0 Percent repaired: 100.0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 0 bytes Bloom filter off heap memory used: 0 bytes Index summary off heap memory used: 0 bytes Compression metadata off heap memory used: 0 bytes Compacted partition minimum bytes: 0 Compacted partition maximum bytes: 0 Compacted partition mean bytes: 0 Average live cells per slice (last five minutes): NaN Maximum live cells per slice (last five minutes): 0 Average tombstones per slice (last five minutes): NaN Maximum tombstones per slice (last five minutes): 0 Dropped Mutations: 1 bytes
This output tells us the following statistics about the Cassandra cluster health:
- SSTable count - This will tell how many SSTables containing data for this table. High value of SSTable (~ more than 10) indicate that compaction is not happening regularly.
- Write count - Constant increase in this value indicates there are continuous writes happening on this table. This is a good indicator that migration utility is actually continuously writing to Cassandra.
- Percent repaired - Consistency repair is actually an important mechanism for making sure copies of data are shipped around the cluster to meet your specified replication factor. If it is less than 50%, then it means that 50% of data is not replicated properly and hence we may end up losing the data.
- Dropped Mutations - The number of mutations (INSERTs, UPDATEs or DELETEs) started on this table but not completed. A large number of dropped mutation means that node is overloaded.
Check write latency of table atokens:
[root@cass-set-1-0 builder]# nodetool tablehistograms oauth2 atokens oauth2/atokens histograms Percentile SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes) 50% 0.00 73.46 0.00 179 10 75% 0.00 126.93 0.00 179 10 95% 0.00 219.34 0.00 179 10 98% 0.00 263.21 0.00 179 10 99% 0.00 315.85 0.00 179 10 Min 0.00 29.52 0.00 150 9 Max 0.00 1386179.89 0.00 179 10This output tells us the following statistics about the Cassandra cluster health:
- Write latency - In above example output (taken from cloud setup), the write latency has been maxed out to 1.38 seconds which is marginally lesser than configured value of write_request_timeout_in_ms = 2 seconds (/etc/cassandra/conf/cassandra.yaml). if it gets higher than configured value then we may see WriteTimeoutException issue while writing records to Cassandra.
[root@cass-set-1-0 builder]# nodetool info ID : 7d134170-9dea-419f-8cc0-9f5692245269 Gossip active : true Thrift active : false Native Transport active: true Load : 3.04 MiB Generation No : 1554892036 Uptime (seconds) : 170138 Heap Memory (MB) : 425.22 / 460.81 Off Heap Memory (MB) : 0.08 Data Center : dc2 Rack : rack1 Exceptions : 0 Key Cache : entries 68, size 5.55 KiB, capacity 23 MiB, 82400 hits, 82665 requests, 0.997 recent hit rate, 14400 save period in seconds Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds Counter Cache : entries 0, size 0 bytes, capacity 11 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds Chunk Cache : entries 33, size 2.06 MiB, capacity 83 MiB, 572 misses, 1288690 requests, 1.000 recent hit rate, NaN microseconds miss latency Percent Repaired : 0.0% Token : (invoke with -T/--tokens to see all 32 tokens) [root@cass-set-1-0 builder]#This output tells us the following statistics about the Cassandra cluster health:
- Heap memory - Too little heap memory availability out of total heap memory may cause slow operations. In this case, check with compactionstats and tablestats command to determine the root cause.
[root@cass-set-0-0 builder]# nodetool compactionstats pending tasks: 0
This output tells the following statistics about the Cassandra cluster health:
Copyright © Cloud Software Group, Inc. All rights reserved.