Monitoring FreeBSD jails from the host

It was May 2021 when I tweeted about monitoring FreeBSD jails which had jail IP addresses only in the 127.0.0.0/8 range. Yesterday, nearly 6 months later, I did the first test of this. This came up because I’m getting a new FreshPorts node ready.

I’ve created a file in the jail to be run from the host. That script runs in the jail but it initiated by a process on the host.

In this post:

FreeBSD 13.0
Nagios 3.5.1
nrpe 3.2.1
ingress refers to a jail which processing commits to the FreeBSD repos and loads them into the FreshPorts database
/usr/local/libexec/nagios-custom is a directory I use to store my custom Nagios scripts.

Installation of the above is outside scope. It is presumed that you have a working Nagios and nrpe installed.

The usual situation

In the past, each of my jails has an RFC-1918 IP address (e.g. 10.0.0.2) and net-mgmt/nrpe3 runs within the jail. nagios, running on my webserver, contacts the jail via port 5666 where nrpe3 is listening.nrpe3 runs a script (which is installed in the jail) and sends the results back to nagios.

What will be different?

The jails will have IP addresses in the 127.0.0.0/8 range (e.g. 127.63.0.10)

The main difference is what nagios contacts. Instead of contacting the jail, it will contact the host. nrpe on the host will use jexec to run a script located in the jail.

The proof of concept

This section outlines the proof of concept I did. I will go through a single monitoring check showing all the pieces installed.

I started with the ingress01 jail. This is the only IP address in the jail:

lo1: flags=8049 metric 0 mtu 16384
	options=680003
	inet6 fd80::10 prefixlen 8
	inet 127.163.0.10 netmask 0xffffffff
	groups: lo

All the monitoring scripts required for this jail are already installed in this jail. I picked one: /usr/local/libexec/nagios-custom/check_freshports_online/span>. That script looks like this:

[r720-02 dan ~] % cat /jails/ingress01/usr/local/libexec/nagios-custom/check_freshports_online
#!/bin/sh

if [ -f /usr/websites/freshports.org/scripts/OFFLINE ]
then
  STATUS='offline'
else
  STATUS='online'
fi

case "${STATUS}" in
    "online")
        echo "OK: ${STATUS}"
        exit 0
        ;;
    "offline")
        echo "WARNING: ${STATUS}"
        exit 1;
        ;;
    *)
        echo "CRITICAL: unknown"
        exit 3;
        ;;
esac

[r720-02 dan ~] %

This script just checks the existence of a file, which I think should be over in /var/db/freshports not /usr/websites/freshports.org/scripts, but that fix is for another day.

I’ll run that script over here, in the jail:

[r720-02-ingress01 root ~] # /usr/local/libexec/nagios-custom/check_freshports_online
OK: online

The key to this proof of concept is being able to run those scripts from the host. I tried this:

[r720-02 dan ~] % sudo jexec ingress01 /usr/local/libexec/nagios-custom/check_freshports_online
OK: online

That’s it. If your check can be run on the host, this approach should work.

Hold on, that’s not quite the same

While writing this, I realized the above test did not replicate what usually happens in my jails.

When nrpe runs, it runs as the nagios user.

The jexec command runs it as root.

We could run all our checks as root, but that’s less than ideal. They are already designed to run as nagios. For security, and less work, let’s still run them as nagios.

Running as nagios

Here is the same command running as the right user:

[r720-02 dan ~] % sudo jexec -U nagios ingress01 /usr/local/libexec/nagios-custom/check_freshports_online
OK: online

The -U specifies the user name from the “jailed environment as whom the command should run”.

Fantastic. I like it.

Why run as Nagios?

The real question is not why run this as nagios.

The real question is: Why would you run this as root?

The jexec command is being invoked on the host, as the nagios user which will need sudo permissions. The permissions will be strict and specify the full command which can be run.

There is no need to run this as root in the jail.

The script could be modified to run on the host using the jail PID. However, all of these scripts were designed to run in a jail as nagios. Being consistent with the scripts, whether running them in in the host or the jail is my preferred approach.

nrpe configuration on the host

My nrpe configuration contains this bit,

[r720-02 dan /usr/local/etc] % tail nrpe.cfg
#include=



# INCLUDE CONFIG DIRECTORY
# This directive allows you to include definitions from config files (with a
# .cfg extension) in one or more directories (with recursion).

#include_dir=
include_dir=/usr/local/etc/nrpe.d

The include_dir directives will pull in any files with a .cfg extension within the /usr/local/etc/nrpe.d directory.

I created /usr/local/etc/nrpe.d/freshports-ingress-via-jexec.cfg:

[r720-02 dan /usr/local/etc/nrpe.d] % cat freshports-ingress-via-jexec.cfg
command[check_freshports_online] = /usr/local/bin/sudo /usr/sbin/jexec ingress01 /usr/local/libexec/nagios-custom/check_freshports_online

freshports-ingress contains the non-jexec version of the checks for a FreshPorts ingress jail. The name of the new file is derived from that.

sudo permission for nagios on the host

The /usr/local/etc/sudoers file on the host contains this line

#includedir /usr/local/etc/sudoers.d

Which means any file in /usr/local/etc/sudoers.d can contain sudo specifications.

Case in point, for the jexec above, we have:

[r720-02 dan /usr/local/etc/sudoers.d] % cat nrpe-freshports-ingress-via-jexec
nagios   ALL=(ALL) NOPASSWD:/usr/sbin/jexec -U nagios ingress01 /usr/local/libexec/nagios-custom/check_freshports_online

This gives the nagios user permission to invoke that command.

jail configuration

As I typed this, I realized I need to modify my ansible scripts so that installing only my nagios-custom scripts is possible. However, that is not relevant to this post. I mention it only in case you need to modifying your processes.

In the jail I have only this file, already mentioned above:

cat /usr/local/libexec/nagios-custom/check_freshports_online
#!/bin/sh

if [ -f /usr/websites/freshports.org/scripts/OFFLINE ]
then
  STATUS='offline'
else
  STATUS='online'
fi

case "${STATUS}" in
    "online")
        echo "OK: ${STATUS}"
        exit 0
        ;;
    "offline")
        echo "WARNING: ${STATUS}"
        exit 1;
        ;;
    *)
        echo "CRITICAL: unknown"
        exit 3;
        ;;
esac

Testing this with nagios

On my webserver, which has Nagios installed, I tried this:

usr/local/libexec/nagios/check_nrpe3 -H r720-02.vpn.unixathome.org -c check_freshports_online
OK: online

Great. This should work from within nagios. If you get this:

/usr/local/libexec/nagios/check_nrpe3 -H r720-02.vpn.unixathome.org -c check_freshports_online
NRPE: Unable to read output

You should check

/var/log/auth.log

on the host and the jail. It is probably permissions.

Not yet deployed

So far, this is only a proof-of-concept, but I plan to deploy it to monitor all three jails on this host.

What will pessimists say? They say this doesn’t scale. You will need the same command, slighlty different for each jail. If each jail needs to monitor the same thing, the nrpe configuration file will need to differentiate them. Something like this:

command[check_freshports_online_jail1] = /usr/local/bin/sudo /usr/sbin/jexec -U nagios jail1 /usr/local/libexec/nagios-custom/check_freshports_online
command[check_freshports_online_jail2] = /usr/local/bin/sudo /usr/sbin/jexec -U nagios jail2 /usr/local/libexec/nagios-custom/check_freshports_online

That is not difficult to do. It is repetitive, but not difficult.

I’ll get started on deploying this as soon as I can.