Resource Monitoring

Container Metrics

These metrics denote the utilization of CPU, memory, network and disks assigned to a container or pod.

Note: CPU, Memory and Network

In the absence of limits and reservations on the containers or pods, all containers and pods can utilize available resources of the node on which they are deployed and running.

Container CPU Metrics

Captured metrics reflect the percentage of CPU utilized by container/pod, user space and a distill of the usage per core.

CPU Metrics example:

{
 "time": 1554194126,
 "message": {
 "cpu_p": 41.88333333333333,
 "user_p": 25.033333333333335,
 "system_p": 16.85,
 "cpu0.p_cpu": 41.88333333333333,
 "cpu0.p_user": 25.033333333333335,
 "cpu0.p_system": 16.85,
 "ingestion_time": "2019-04-02T08:35:26+00:00",
 "tag": "tml-nosql.6c680d874676.metrics.cpu"
 }
}

Metric Name	Field Name	Units	Data Type	Notes
Total CPU consumption	cpu_p	%	number	Total CPU usage across all cores assigned to the container - includes user and Kernel processes if there are 4 cores the container can use, the percent usage can go up to 400%
CPU consumption by user processes	user_p	%	number	Total CPU used by user processes across all cores
CPU consumption by kernel processes	system_p	%	number	Total CPU used by kernel processes across all cores.
Total usage per core N	cpuN	%	number	Usage of Core N by user and kernel processes
User processes usage of Core N	cpuN	%	number	Usage of Core N by user processes
Kernel processes usage of core N	cpuN	%	number	Usage of Core N by kernel processes.

Container Memory

Pod/Container memory metrics example:

{
  "time": 1554198840,
  "message": {
    "Mem.total": 4045520,
    "Mem.used": 3932664,
    "Mem.free": 112856,
    "Swap.total": 1928204,
    "Swap.used": 1483436,
    "Swap.free": 444768,
    "ingestion_time": "2019-04-02T09:54:00+00:00",
    "tag": "tml-log.6a6873b34d5e.metrics.mem"
  }
}

Metric Name	Field Name	Units	Data Type	Notes
Total memory (RAM)	Mem.total	bytes	Number	Total memory available to container or pod in bytes
Used memory (RAM)	Mem.used	bytes	Number	Memory utilized by container in bytes
Free memory (RAM)	Mem.free	bytes	Number	available free RAM in bytes
Total swap space	Swap.total	bytes	Number	Total swap space
Used swap space	Swap.used	bytes	Number	Used swap space
Free swap space	Swap.free	bytes	Number	Free swap space

Container Disk

Captured metrics reflects number of bytes read and written at the point in time.

Pod/ Container disk metrics example:

{
  "time": 1554193560,
  "message": {
    "read_size": 7029587968,
    "write_size": 14102749184,
    "ingestion_time": "2019-04-02T08:26:00+00:00",
    "tag": "tml-log.6a6873b34d5e.metrics.disk"
  }
}

Metric Name	Field Name	Units	Data Type
Total bytes read from disk	read_Size	bytes	Number
Total bytes written to disk	write_size	bytes	Number

Container Network

The network metrics are available per network interface like eth1, lo etc. The metrics captured reflect the transmit and receive size at the point in time.

Pod/Container Network metrics example:

{
  "time": 1554199020,
  "message": {
    "eth0.rx.bytes": 516319,
    "eth0.rx.packets": 1062,
    "eth0.rx.errors": 0,
    "eth0.tx.bytes": 61578,
    "eth0.tx.packets": 893,
    "eth0.tx.errors": 0,
    "ingestion_time": "2019-04-02T09:57:00+00:00",
    "tag": "tml-log.6a6873b34d5e.metrics.netif"
  }
}

Metric Name	Field Name	Units	Data Type	Notes
Bytes transmitted on a netif_name	netif_name	bytes	Number	Total bytes transmitted for the particular network interface.
Packets transmitted on a netif_name	netif_name	Packet	Number	Total packets transmitted for the particular network interface.
Errors in transmitting packets on a netif_name	netif_name	Packet	Number	Number of packets failed to be transmitted for particular network interface due to window, carrier, aborted, or heartbeat errors
Bytes recieved on a netif_name	netif_name	bytes	Number	Total bytes recieved for the particular network interface.
Packets recieved on a netif_name	netif_name	Packet	Number	Total packets recieved for the particular network interface.
Errors recieving packets on a netif_name	netif_name	Packet	Number	Number of packets dropped

Common Process Metrics

{
  "time": 1554199440,
  "message": {
    "alive": true,
    "proc_name": "td-agent-bit",
    "pid": 2156,
    "mem.VmPeak": 83856000,
    "mem.VmSize": 83852000,
    "mem.VmLck": 0,
    "mem.VmHWM": 7416000,
    "mem.VmRSS": 3412000,
    "mem.VmData": 31028000,
    "mem.VmStk": 132000,
    "mem.VmExe": 4184000,
    "mem.VmLib": 5352000,
    "mem.VmPTE": 140000,
    "mem.VmSwap": 2040000,
    "fd": 65,
    "ingestion_time": "2019-04-02T10:04:00+00:00",
    "tag": "tml-log.6a6873b34d5e.metrics.proc.td-agent-bit"
  }
}

Metric Name	Field Name	Unit	Data Type	Notes
Process status	alive		Boolean	Is the process running?
Process name	proc_name		String	Name of the process as identified by /proc/pid/cmd
Peak virtual memory usage	mem.VmPeak	bytes	Number	Max memory used by this process so far
Virtual memory size	mem.VmSize	bytes	Number
Current mlocked memory	mem.VmLck	bytes	Number	Amount of memory locked by the process. This memory is released after the process exits.
Peak RAM used	mem.VmHWM	bytes	Number
Current RAM being used	mem.VmRSS	bytes	Number
Size of "data"	mem.VmData	bytes	Number
Size of stack	mem.VmStk	bytes	Number
Size of "text" segment	mem.VmExe	bytes	Number
Shared library mem usage	mem.VmLib	bytes	Number
Current swap space used	mem.VmSwap	bytes	Number

Process List

Processes on all containers

Process Name	Description
Containeragent	The Mashery Local container agent which manages all processes running inside a container.
td-agent-bit	The Log and metrics forwarder. It forwards all logs to the Log service.
syslog-ng	Supervisor + worker.

Per Container Processes

Container Name	Process Name	Description
TM	proxy	Traffic Manager Proxy (embedded jetty)
Sql	Jetty	On-Prem Loader - syncs with MOM in tethered mode
Sql	mysqld	Service for MySql
NoSql (seed and non-seed)	Cassandra
NoSql (only on non-seed)	Jetty	Jetty server hosting the ML Registry Java webapp
Cache	Memcached	6 processes 1 each for pools 11211, 11212, 11213, 1124, 11215 and, 11216
Cache	pxrt	The memcache loader, which keeps memcache up-to-date with changes to service definitions, packages et al.
Api	lighthttpd	CGI server supporting PHP CGI
API	memcached	2 processes 1 each for pools 11211 and 11214
API	pxrt	embedded jetty server hosting the V3 API
API	php-cgi	~20 php-cgi processes - workers which execute a V2 API request
CM	Jetty	Jetty server hosting the certificate manager Java webapp.
logservice	td-agent	Log collector and forwarder. Grabs logs from other containers and forwards them to user chosen destination. 1 supervisor + 9 Workers
logservice	java	process which syncs access logs to TIBCO Cloud Mashery in "tethered" mode.

Diagnostic Recipe / Alerts

Metric	Field Name / Computation	Notes
Is Process Alive	alive	The process status metrics are captured every minute Low water mark is first time if `alive=false` and High watermark If it continues for next 5 minutes, for example, `alive=false` for next 5 times the process metrics are gathered.
Continuous high memory usage	mem.VmHWM / ? > .8	Process / Expected usage / Water mark memcached 11214 / This memcached pool will take up more memory than the other pools. / low water mark td-agent-bit / The memory utilization by this should be in the order of MBs. / low water mark Traffic Manager (javaproxy) / The memory utilization by the Traffic Manager will see spikes and troughs but the average utilization will be mid-range to low water mark. / low water mark MySqld / Cassandra / During replication cycles, Cassandra will utilize higher, for example > low water mark

Metric

Field Name / Computation

Notes

Is Process Alive

alive

The process status metrics are captured every minute

Low water mark

is first time if `alive=false` and

High watermark

If it continues for next 5 minutes, for example, `alive=false` for next 5 times the process metrics are gathered.

Continuous high memory usage

mem.VmHWM / ? > .8

Process / Expected usage / Water mark

memcached 11214 / This memcached pool will take up more memory than the other pools. / low water mark
td-agent-bit / The memory utilization by this should be in the order of MBs. / low water mark
Traffic Manager (javaproxy) / The memory utilization by the Traffic Manager will see spikes and troughs but the average utilization will be mid-range to low water mark. / low water mark
MySqld /
Cassandra / During replication cycles, Cassandra will utilize higher, for example > low water mark

Contents

Index

Search Results

Resource Monitoring

Container Metrics

Common Process Metrics

Process List

Diagnostic Recipe / Alerts