Functional Monitoring

Functional monitoring includes watching:
  • Response codes other than 200, such 403, 5xx.
  • Response times

Functional Monitoring is about detecting sudden or sustained spikes in either the response codes and or response times.

There could be a myriad of reasons why the health of Mashery Local can deteriorate.

What to Monitor Thresholds Runbook Comments
API calls are returning 596 Low watermark: repeated 596 in a low interval say 1 min

High watermark: Repeated 596 for many APIs in low interval say 1min OnPrem missed 2 or more cycles in a row. OnPrem loader failure on each of the two invocations.

Check Memcache Stats, OnPremLoader and Memcache Loader Status Possible reasons:
  1. Memcache is evicting because of slab fragmentation.
  2. Onprem loader has not successfully synced with MOM.
  3. Onprem loader could not update SQL DB.
  4. Memcache loader has not picked up updated data from SQL.
High latency in API traffic Low watermark: SLA breached within 30 seconds for API calls

High watermark: Regular & Repeated Latency spikes Continuous prolonged high latency for a long duration of say 5 mins

Check NoSQL Stats, Memcache Stats, TM Stats Possible reasons:
  1. OAuth tokens, keys validation is slow.
  2. Memcache retrieval is slow.
  3. Target System is slow to respond.
  4. The request is queued due to unavailability of threads on Proxy.
APU calls are returning 403 Low watermark: repeated 403 with tokens inside a 30-second interval

High watermark:prolonged 403 or regular occurrence of the low watermark

Possible reasons 403 errors are typically seen with OAuth token validation and this problem will be typically seen in a multi-zone setup due to dependency on Cassandra replicating across zones.
Continuous Monitoring- Running a pre-configured Protected / Unprotected Traffic Call for each TM every x min. Protected Call on each TM - The call should include creating a new token, making a traffic call, deleting a token.