Sep 242010
 

Earlier today, I noticed the following output from a Bacula job:

24-Sep 14:14 bacula-dir JobId 38548: Start Backup JobId 38548, Job=latens_home.2010-09-24_14.12.38_31
24-Sep 14:14 bacula-dir JobId 38548: Using Device “MegaFile-latens”
24-Sep 14:09 latens-fd JobId 38548: DIR and FD clocks differ by -307 seconds, FD automatically compensating.

That’s 5 minutes. It shouldn’t be varying by that much.

So I started ntp. That’s when I noticed it was not being started by /etc/rc.conf. But I thought not much more of it. Later, I thought: let’s add that to my Nagios configuration. First stop: check nrpe2.cfg and see if there’s an entry for check_ntp. There was. Oh… well, let’s check Nagios. Oh, ntpd is already monitored. OK, what’s going on here?

Let’s see what the check returns when ntpd is stopped:

$ /usr/local/libexec/nagios/check_nrpe2 -H latens-vpn -c check_ntp
/var/run/nrpe2.pid OK – 974 ?? Ss 3:20.45 /usr/local/sbin/nrpe2 -d -c /usr/local/etc/nrpe.cfg

Hmm, it says it’s running. Let’s see what’s on the client:

$ grep ntp /usr/local/etc/nrpe.cfg
command[check_ntp]=/usr/local/libexec/nagios/check_pid_sudo /var/run/nrpe2.pid

Oh, that’s the wrong pid file. This is checking nrpe, not ntpd. The correct command is:

command[check_ntp]=/usr/local/libexec/nagios/check_pid_sudo /var/run/ntpd.pid

After amending the sudoers file to account for the corrected command, things run correctly on the client:

$ /usr/local/libexec/nagios/check_pid_sudo /var/run/ntpd.pid
/var/run/ntpd.pid OK – 81358 ?? Ss 0:00.02 /usr/sbin/ntpd -c /etc/ntp.conf -p /var/run/ntpd.pid -f /var/dbpd.drift

And on the Nagios server:

$ /usr/local/libexec/nagios/check_nrpe2 -H latens-vpn -c check_ntp
/var/run/ntpd.pid OK – 81358 ?? Ss 0:00.02 /usr/sbin/ntpd -c /etc/ntp.conf -p /var/run/ntpd.pid -f /var/db/ntpd.drift

I checked the output of all other ntpd checks within Nagios. I found no other references to nrpe….

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

  3 Responses to “ntp wasn’t running but Nagios didn’t notice”

  1. So is that a problem with the check_ntp script as distributed, or a mistake that has been user-induced?

  2. The problem was with me. I added the incorrect line to nrpe2.cfg.

  3. I think what I needed to do was:

    command[check_ntp]=/usr/local/libexec/nagios/check_pid_sudo /var/run/ntpd.pid

    At least, that’s what’s running now.