Define Problem Areas
The first step in any monitoring strategy is to consider issues with resource contention or other problems that have occurred in the past. Make a list of questions you would like to answer about your network systems and applications, problems you would like to solve, and situations you would like to avoid. For example:
|
•
|
Be notified when an internal application metric exposed through AMI is in a problem state. |
|
•
|
Be notified when critical processes fail or consume too many system resources, such as memory or CPU cycles. |
|
•
|
Be notified when disk space is low, or when it is decreasing at some critical rate. |
|
•
|
Gain more control over processes that slow down the system. |
|
•
|
Avoid multiple servers going down on the same day. |
|
•
|
Be notified when important messages appear in log files or event logs. |
When the list is complete, assign a priority to each item. Decide which improvements provide the greatest benefit to your enterprise, then rank items in order of importance.