Oct 292019
 

I’ve been getting these messages in /var/log/messages on slocum for as long as I can remember.

Today I found out why those errors are occurring.

They are logged on the FreeBSD jail host for a Nagios instance I run. Nagios runs in a jail on that host. I’ve just been ignoring the messages, but today it got me. I was having trouble getting a new FreeBSD port to work, so I was easily distracted.

In this post:

  • FreeBSD 12.0
  • Nagios 3.5.1
  • nagios-check_bacula9 9.4.3

Background

My Nagios instance uses net-mgmt/nagios-check_bacula9 to verify that various Bacula components are running. The command looks something like this:

define command {
  command_name  check_bacula_fd
  command_line  $USER1$/check_bacula -H $HOSTADDRESS$ -D fd -M monitor -K 'password'
  register                        1
}

The following explains the parameters:

[dan@webserver:~/tmp] $ /usr/local/libexec/nagios/check_bacula -h
Copyright (C) 2005 Christian Masopust
Written by Christian Masopust (2005)

Version: 9.4.3 (02 May 2019) amd64-portbld-freebsd12.0 freebsd 12.0-RELEASE-p5

Usage: check_bacula [-d debug_level] -H host -D daemon -M name -P port
       -H <host>     hostname where daemon runs
       -D <daemon>   which daemon to check: dir|sd|fd
       -M <name>     name of monitor (as in bacula-*.conf)
       -K <md5-hash> password for access to daemon
       -P <port>     port where daemon listens
       -dnn          set debug level to nn
       -?            print this message.

[dan@webserver:~/tmp] $ 

Tracking it down

I tried enabling debug in Nagios. Too much information. I tried disabling the checks on all the Bacula services. No luck. Something was still invoking the check and I could not see from where.

Finally, I found it. I had two hosts which are powered off: tape01 & tape02

Within Nagios, to avoid spamming myself with notifications, both of these hosts are set with:

  • Notifications for this host have been disabled
  • Checks of this host have been disabled

I noticed that checks for bacula-fd were still enabled on those two hosts. I disabled the checks and waited.

The log entries stopped.

Why?

If the host is unreachable, check_bacula generates a Segmentation fault when checking for bacula-fd or bacula-sd, but not bacula-dir.

Here are my test results:


[dan@webserver:/usr/local/etc/nagiosql] $ /usr/local/libexec/nagios/check_bacula -H 172.16.0.1 -D fd -M user -K 'password' ; date
Segmentation fault
Tue Oct 29 19:50:10 UTC 2019

[dan@webserver:/usr/local/etc/nagiosql] $ /usr/local/libexec/nagios/check_bacula -H 172.16.0.1 -D sd -M user -K 'password' ; date
Segmentation fault
Tue Oct 29 20:40:02 UTC 2019

[dan@webserver:/usr/local/etc/nagiosql] $ /usr/local/libexec/nagios/check_bacula -H 172.16.0.1 -D dir -M user -K 'password' ; date
BACULA CRITICAL - Cannot authenticate to Director: 
Tue Oct 29 20:41:30 UTC 2019

Here are the entries from /var/log/messages which match the first two errors above:

Oct 29 19:50:10 slocum kernel: pid 600 (check_bacula), uid 1001: exited on signal 11
Oct 29 20:40:02 slocum kernel: pid 46828 (check_bacula), uid 1001: exited on signal 11

The issue has been reported.

Clearing it up

I have since clicked on Disable checks of all services on this host for both hosts.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive