Jiafan Zhou wrote:
One way I can image is to monitor the CPU usage of such component or the memory footprint. Is that correct?
Yes. We also monitor ping (to make sure the network and hardware are up), HTTP (to make sure the server's responding) and response time (to make sure it's not overwhelmed). We also have some pages that our monitors scan for a particular piece of text that has to be recovered from the database, so we can be sure our application stack is working.
We use a combination of
Nagios and
ipMonitor (ipMonitor has more tools to monitor Windows machines) which send alerts via email, pager and SMS.