FAQs
How do I get Grafana's URL? |
---|
How do I label a node for reporting in a k8s or OpenShift cluster? |
---|
For k8s cluster:
kubectl label nodes <nodename> node-name=reporting For OpenShift cluster:
oc label nodes <nodename> node-name=reporting |
How do I see the labels on a node? |
---|
To see the labels on the nodes for k8s cluster:
kubectl get nodes --show-labels To see labels on the nodes for OpenShift cluster:
oc get nodes --show-labelsVerify that one of the nodes is labelled as "node-name=reporting". |
How to get the node running the reporting pod/container? |
---|
To see on which node the reporting pod in running on a k8s cluster:
kubectl get pods -o wideTo see on which node the reporting pod in running on an OpenShift k8s cluster: oc get pods -o wideTo see on which node the reporting pod in running on a Swarm cluster, run the following command on the node which has a placement constraint for the reporting container: docker ps |
Where can logs for different services in the reporting-pod be checked? |
---|
You can also check the entry point logs to see where the configuration for different services are picked up from at /var/log/reporting_entrypoint.log |
How do I change the retention of metrics in Prometheus? |
---|
Create a file
cleanup_prometheus_data.ini and add the following details:
[DELETION_PERIOD] # Deletion period of prometheus data, its in days(number of days for which data has to be kept) VERBOSE_METRICS_DATA_DELETION={number_of_days} PROCESS_METRICS_DATA_DELETION={number_of_days}Save this file and follow instructions in the TML-Reporting Configuration topic to apply the changes during deployment. |
How do I change the retention of metrics in Loki? |
---|
Create a file
loki-docker-config.yaml and paste the following content. Change the value of variable "{hours e.g. 96h}" to keep the application logs:
auth_enabled: false server: http_listen_port: 3100 ingester: lifecycler: address: 127.0.0.1 ring: kvstore: store: inmemory replication_factor: 1 final_sleep: 0s chunk_idle_period: 5m chunk_retain_period: 30s max_transfer_retries: 0 chunk_target_size: 1536000 schema_config: configs: - from: 2020-07-15 store: boltdb object_store: filesystem schema: v11 index: prefix: index_ period: {hours e.g. 96h} storage_config: boltdb: directory: /mnt/data/loki/index filesystem: directory: /mnt/data/loki/chunks limits_config: enforce_metric_name: false reject_old_samples: true reject_old_samples_max_age: {hours e.g. 96h} ingestion_rate_mb: 16 ingestion_burst_size_mb: 16 chunk_store_config: max_look_back_period: 0s table_manager: retention_deletes_enabled: true retention_period: {hours e.g. 96h}Save this file and follow instructions in the Loki Configuration topic to apply the changes during deployment. |
In which zone reporting-pod will be deployed in multi-zone environment? |
---|
In a multi-zone environment, the first zone value (in the array of zone names) given in the manifest file need to be used for labelling. This is the default zone in which reporting would be deployed, provided the node in that zone is properly labelled, as specified in the Prerequisites section. |
Will the reporting-pod be displayed in the "cluster manager ls components" command? |
---|
No. The reporting pod/container is not managed by TML-cluster, so it won't be listed by Cluster Manager's list components command. |
How do I check the reporting-pod's status? |
---|
For K8s cluster:
kubectl get pods | grep -i reporting-set-0-0 For OpenShift cluster:
oc get pods | grep -i reporting-set-0-0 For Swarm cluster: Navigate to the node which hosts the reporting pod, then run the command:
docker ps -a | grep -i reporting |
How is the QPS rate calculated? |
---|
Traffic/Qps Rate Computation.
QPS is the rate of traffic calls per second. It is a rate function calculated on a duration of sliding window (default 5 minutes). In a sliding window of 5 (300 seconds) minutes, the QPS is computed as `total_traffic_in_5_mins/(5*60)`. For low traffic conditions, the actual QPS would be near zero and show up as fractions or decimal values on the graph. For example, for 60 calls in 5 minutes, the QPS would be 60/(5*60) i.e.1/5 or 0.20. For 18K calls in 5 minutes, the QPS would be 18000/(5*60) i.e 60 QPS. For finer granularity, reduce the window size to a lesser value (minimum 1 minute). |
Why is CPU percentage going beyond 100%? |
---|
%CPU -- CPU Usage is the percentage of your CPU that is being used by the process. By default, the top displays this as a percentage of a single CPU. On multi-core systems, you can have percentages that are greater than 100%. For example, if 3 cores are at 60% use, the top will show a CPU use of 180% for that process. Total CPU Usage graph is the summation of CPU usage of individual processes that are running on that pod/container. |
Why is there a different value for uptime metrics if the time range is changed? |
---|
The interval to fetch the data for a graph in Grafana is changed for queries having a time range greater than 24 hours. In the case where the selected time range is greater than 1 day, the step interval changes from 1 minute (default) to a higher interval of aggregated data to reduce the number of data points fetched from Prometheus, thus reducing the turnaround time for any query. Due to a change in step interval, the data fetched might be an old data point which displays different values in the graph as per the step interval. For example, if the step interval is 2 minutes, then the last data point fetched would be of n-2 minute where n is the current minute. So, the single stats panel would display that data point only. This would vary depending upon the time range selected in the dashboard. |