So far, all network monitoring solutions I’ve tested suck.

  • Nagios, the dinosaur of network monitoring is a pain to configure, and somewhat half of it’s features feel incomplete to me. It also spawns a lot of processes, which probably isn’t very efficient. I find the UI pretty much useless, thats why I rely mostly on “naglite” (a dirty PHP script) to provide a status screen. I don’t think it has any “numerical” monitoring for things like e.g. diskspace, you can only configure alerts when it hits a certain trigger value. I have to admit that Nagios is very powerful when it comes to alerting people based on schedules etc. That probably us why it’s #1 for monitoring, it can be setup to send an alert to exactly the right person on duty… Another good feature is the scheduling of downtimes.
  • Munin, formerly known as LRRD, is a general purpose graphing tool. It’s especially useful in combination to nagios, since it provides some features nagios is missing and duplicates little. Nagios is nice for doing e.g. a usage analysis of you VPN or graphing network load etc. However it doesn’t do trend or some more complex analysis.
  • Zabbix is written in PHP, and suffers apparently from the same problems as most PHP programs: looks like it’s great, but lots of stuff isn’t working right or correctly. To me it felt like it was developed without a master plan or any kind of software engineering in the background, but by randomly adding and changing features. I found configuring it a pain, I was tempted to write my own configuration tool for it. But maybe they’ve changed their configuration again, they had been doing that pretty much every version back then (one of the reasons why I felt it was developed without much of a plan). I also had the impression that it’s monitoring is quite inefficient and not capable of handling irregularities in scheduling well. So as long as your network is working great, it will probably work as well, but if you have some serious network problems, it will maybe get totally messed up.
  • NetMRG can plot you all kinds of graphs you do not need.
  • Tons of other monitoring tools I tested which had random problems, from just being able to graph data, but not to assign a meaning to them to being a Java GUI app…

What I’m looking for is a network monitoring solutions, which

  • Allows me to setup monitoring with the host, service model I use for my tasks, ideally even allowing me to easily move a service from one host to another. Should be class based, and allow multiple classes per system.
  • Understands that some services are important (e.g. firewall), and others are not (paper availability in the printers)
  • Doesn’t waste my time with millions of configuration settings, network mapping, SNMP digits, listing every single SNMP value…
  • Doesn’t bother me with all the thousands of values it could collect
  • Handles dependencies, and doesn’t alert me separately for obvious consequences
  • Has a good alerting system, with little to setup
  • Has a good monitoring scheduler, that will adjust it’s monitoring interval to the current monitoring load (e.g. reduce scheduling density when all services are timing out, do multithreaded pinging / querying of hosts, avoid busy polling, use push instead of poll where possible.
  • Has a very minimalistic status screen, as long as everything is okay it should just say so; if something is down, it should show some report on what is wrong, but never ever fill the screen. The main monitoring screen doesn’t have a mousewheel or any other input device! (Test: should be useful even without colors!)
  • Multi-host monitoring with synchronization, so I can e.g. have a monitoring service in the internal network and one in the DMZ. If they are disconnected, they monitor independenctly and merge data afterwards.
  • Does real trend analysis on numeric values, with some prediction model, for e.g. diskspace. Diskspace often follows daily, weekly and monthly patterns. Especially log file partitions. The monitoring tool should alert me if it looks like the disk is going to be full “anytime soon”, not just when it’s too late!
  • Not written in a bad language such as PHP, efficient, not spawning thousands of processes…

Pretty much every monitoring solution I’ve seen so far is great at collecting tons of data, but doesn’t help me with actually handling this amount of data.

Anyone has a recommendation for a good network monitoring tool?

I’d be really interested in doing the last point - a real statistical analysis for network monitoring. This would be so useful… predicting peaks in network usage, predicting when a system will be overloaded or a disk full… but I’m phasing out of network administration; so my interest here is mostly in being able to give advice to others.

[Update: no, you don’t need to point me to Cacti. It’s just another grapher and data collector that doesn’t actually do what I would call ‘monitoring’. It also seems to not have a smart scheduler, and is written in PHP (which is bad!) I was also told a non-success story with OpenNMS which just crashed when adding a host to be monitored with a totally unhelpful stack trace. I had a look at their online demo, but it felt very complex and not very useful to me…]