Define Problem Areas

The first step in any monitoring strategy is to consider issues with resource contention or other problems that have occurred in the past. Make a list of questions you would like to answer about your network systems and applications, problems you would like to solve, and situations you would like to avoid. For example:

Be notified when an internal application metric exposed through AMI is in a problem state.
Be notified when critical processes fail or consume too many system resources, such as memory or CPU cycles.
Be notified when disk space is low, or when it is decreasing at some critical rate.
Gain more control over processes that slow down the system.
Avoid multiple servers going down on the same day.
Be notified when important messages appear in log files or event logs.

When the list is complete, assign a priority to each item. Decide which improvements provide the greatest benefit to your enterprise, then rank items in order of importance.