I’ve been getting these messages in /var/log/messages on slocum for as long as I can remember.
Today I found out why those errors are occurring.
They are logged on the FreeBSD jail host for a Nagios instance I run. Nagios runs in a jail on that host. I’ve just been ignoring the messages, but today it got me. I was having trouble getting a new FreeBSD port to work, so I was easily distracted.
In this post:
- FreeBSD 12.0
- Nagios 3.5.1
- nagios-check_bacula9 9.4.3
Background
My Nagios instance uses net-mgmt/nagios-check_bacula9 to verify that various Bacula components are running. The command looks something like this:
define command { command_name check_bacula_fd command_line $USER1$/check_bacula -H $HOSTADDRESS$ -D fd -M monitor -K 'password' register 1 }
The following explains the parameters:
[dan@webserver:~/tmp] $ /usr/local/libexec/nagios/check_bacula -h Copyright (C) 2005 Christian Masopust Written by Christian Masopust (2005) Version: 9.4.3 (02 May 2019) amd64-portbld-freebsd12.0 freebsd 12.0-RELEASE-p5 Usage: check_bacula [-d debug_level] -H host -D daemon -M name -P port -H <host> hostname where daemon runs -D <daemon> which daemon to check: dir|sd|fd -M <name> name of monitor (as in bacula-*.conf) -K <md5-hash> password for access to daemon -P <port> port where daemon listens -dnn set debug level to nn -? print this message. [dan@webserver:~/tmp] $
Tracking it down
I tried enabling debug in Nagios. Too much information. I tried disabling the checks on all the Bacula services. No luck. Something was still invoking the check and I could not see from where.
Finally, I found it. I had two hosts which are powered off: tape01 & tape02
Within Nagios, to avoid spamming myself with notifications, both of these hosts are set with:
- Notifications for this host have been disabled
- Checks of this host have been disabled
I noticed that checks for bacula-fd were still enabled on those two hosts. I disabled the checks and waited.
The log entries stopped.
Why?
If the host is unreachable, check_bacula generates a Segmentation fault when checking for bacula-fd or bacula-sd, but not bacula-dir.
Here are my test results:
[dan@webserver:/usr/local/etc/nagiosql] $ /usr/local/libexec/nagios/check_bacula -H 172.16.0.1 -D fd -M user -K 'password' ; date Segmentation fault Tue Oct 29 19:50:10 UTC 2019 [dan@webserver:/usr/local/etc/nagiosql] $ /usr/local/libexec/nagios/check_bacula -H 172.16.0.1 -D sd -M user -K 'password' ; date Segmentation fault Tue Oct 29 20:40:02 UTC 2019 [dan@webserver:/usr/local/etc/nagiosql] $ /usr/local/libexec/nagios/check_bacula -H 172.16.0.1 -D dir -M user -K 'password' ; date BACULA CRITICAL - Cannot authenticate to Director: Tue Oct 29 20:41:30 UTC 2019
Here are the entries from /var/log/messages which match the first two errors above:
Oct 29 19:50:10 slocum kernel: pid 600 (check_bacula), uid 1001: exited on signal 11 Oct 29 20:40:02 slocum kernel: pid 46828 (check_bacula), uid 1001: exited on signal 11
The issue has been reported.
Clearing it up
I have since clicked on Disable checks of all services on this host for both hosts.