It was May 2021 when I tweeted about monitoring FreeBSD jails which had jail IP addresses only in the 127.0.0.0/8 range. Yesterday, nearly 6 months later, I did the first test of this. This came up because I’m getting a new FreshPorts node ready.
I’ve created a file in the jail to be run from the host. That script runs in the jail but it initiated by a process on the host.
In this post:
- FreeBSD 13.0
- Nagios 3.5.1
- nrpe 3.2.1
- ingress refers to a jail which processing commits to the FreeBSD repos and loads them into the FreshPorts database
- /usr/local/libexec/nagios-custom is a directory I use to store my custom Nagios scripts.
Installation of the above is outside scope. It is presumed that you have a working Nagios and nrpe installed.
The usual situation
In the past, each of my jails has an RFC-1918 IP address (e.g. 10.0.0.2) and net-mgmt/nrpe3 runs within the jail. nagios, running on my webserver, contacts the jail via port 5666 where nrpe3 is listening.nrpe3 runs a script (which is installed in the jail) and sends the results back to nagios.
What will be different?
The jails will have IP addresses in the 127.0.0.0/8 range (e.g. 127.63.0.10)
The main difference is what nagios contacts. Instead of contacting the jail, it will contact the host. nrpe on the host will use jexec to run a script located in the jail.
The proof of concept
This section outlines the proof of concept I did. I will go through a single monitoring check showing all the pieces installed.
I started with the ingress01 jail. This is the only IP address in the jail:
lo1: flags=8049metric 0 mtu 16384 options=680003 inet6 fd80::10 prefixlen 8 inet 127.163.0.10 netmask 0xffffffff groups: lo
All the monitoring scripts required for this jail are already installed in this jail. I picked one: /usr/local/libexec/nagios-custom/check_freshports_online/span>. That script looks like this:
[r720-02 dan ~] % cat /jails/ingress01/usr/local/libexec/nagios-custom/check_freshports_online #!/bin/sh if [ -f /usr/websites/freshports.org/scripts/OFFLINE ] then STATUS='offline' else STATUS='online' fi case "${STATUS}" in "online") echo "OK: ${STATUS}" exit 0 ;; "offline") echo "WARNING: ${STATUS}" exit 1; ;; *) echo "CRITICAL: unknown" exit 3; ;; esac [r720-02 dan ~] %
This script just checks the existence of a file, which I think should be over in /var/db/freshports not /usr/websites/freshports.org/scripts, but that fix is for another day.
I’ll run that script over here, in the jail:
[r720-02-ingress01 root ~] # /usr/local/libexec/nagios-custom/check_freshports_online OK: online
The key to this proof of concept is being able to run those scripts from the host. I tried this:
[r720-02 dan ~] % sudo jexec ingress01 /usr/local/libexec/nagios-custom/check_freshports_online OK: online
That’s it. If your check can be run on the host, this approach should work.
Hold on, that’s not quite the same
While writing this, I realized the above test did not replicate what usually happens in my jails.
When nrpe runs, it runs as the nagios user.
The jexec command runs it as root.
We could run all our checks as root, but that’s less than ideal. They are already designed to run as nagios. For security, and less work, let’s still run them as nagios.
Running as nagios
Here is the same command running as the right user:
[r720-02 dan ~] % sudo jexec -U nagios ingress01 /usr/local/libexec/nagios-custom/check_freshports_online OK: online
The -U specifies the user name from the “jailed environment as whom the command should run”.
Fantastic. I like it.
Why run as Nagios?
The real question is not why run this as nagios.
The real question is: Why would you run this as root?
The jexec command is being invoked on the host, as the nagios user which will need sudo permissions. The permissions will be strict and specify the full command which can be run.
There is no need to run this as root in the jail.
The script could be modified to run on the host using the jail PID. However, all of these scripts were designed to run in a jail as nagios. Being consistent with the scripts, whether running them in in the host or the jail is my preferred approach.
nrpe configuration on the host
My nrpe configuration contains this bit,
[r720-02 dan /usr/local/etc] % tail nrpe.cfg #include=# INCLUDE CONFIG DIRECTORY # This directive allows you to include definitions from config files (with a # .cfg extension) in one or more directories (with recursion). #include_dir= include_dir=/usr/local/etc/nrpe.d
The include_dir directives will pull in any files with a .cfg extension within the /usr/local/etc/nrpe.d directory.
I created /usr/local/etc/nrpe.d/freshports-ingress-via-jexec.cfg:
[r720-02 dan /usr/local/etc/nrpe.d] % cat freshports-ingress-via-jexec.cfg command[check_freshports_online] = /usr/local/bin/sudo /usr/sbin/jexec ingress01 /usr/local/libexec/nagios-custom/check_freshports_online
freshports-ingress contains the non-jexec version of the checks for a FreshPorts ingress jail. The name of the new file is derived from that.
sudo permission for nagios on the host
The /usr/local/etc/sudoers file on the host contains this line
#includedir /usr/local/etc/sudoers.d
Which means any file in /usr/local/etc/sudoers.d can contain sudo specifications.
Case in point, for the jexec above, we have:
[r720-02 dan /usr/local/etc/sudoers.d] % cat nrpe-freshports-ingress-via-jexec nagios ALL=(ALL) NOPASSWD:/usr/sbin/jexec -U nagios ingress01 /usr/local/libexec/nagios-custom/check_freshports_online
This gives the nagios user permission to invoke that command.
jail configuration
As I typed this, I realized I need to modify my ansible scripts so that installing only my nagios-custom scripts is possible. However, that is not relevant to this post. I mention it only in case you need to modifying your processes.
In the jail I have only this file, already mentioned above:
cat /usr/local/libexec/nagios-custom/check_freshports_online #!/bin/sh if [ -f /usr/websites/freshports.org/scripts/OFFLINE ] then STATUS='offline' else STATUS='online' fi case "${STATUS}" in "online") echo "OK: ${STATUS}" exit 0 ;; "offline") echo "WARNING: ${STATUS}" exit 1; ;; *) echo "CRITICAL: unknown" exit 3; ;; esac
Testing this with nagios
On my webserver, which has Nagios installed, I tried this:
usr/local/libexec/nagios/check_nrpe3 -H r720-02.vpn.unixathome.org -c check_freshports_online OK: online
Great. This should work from within nagios. If you get this:
/usr/local/libexec/nagios/check_nrpe3 -H r720-02.vpn.unixathome.org -c check_freshports_online NRPE: Unable to read output
You should check
/var/log/auth.log
on the host and the jail. It is probably permissions.
Not yet deployed
So far, this is only a proof-of-concept, but I plan to deploy it to monitor all three jails on this host.
What will pessimists say? They say this doesn’t scale. You will need the same command, slighlty different for each jail. If each jail needs to monitor the same thing, the nrpe configuration file will need to differentiate them. Something like this:
command[check_freshports_online_jail1] = /usr/local/bin/sudo /usr/sbin/jexec -U nagios jail1 /usr/local/libexec/nagios-custom/check_freshports_online command[check_freshports_online_jail2] = /usr/local/bin/sudo /usr/sbin/jexec -U nagios jail2 /usr/local/libexec/nagios-custom/check_freshports_online
That is not difficult to do. It is repetitive, but not difficult.
I’ll get started on deploying this as soon as I can.