Yesterday, I noticed this message in one of the “daily security run output” emails which FreeBSD host can send out.
nrpe3-3.2.1_1: Tag: expiration_date Value: 2023-06-03 nrpe3-3.2.1_1: Tag: deprecated Value: Fails to build with recent OpenSSL so use net-mgmt/nrpe
I’ve used net-mgmt/nrpe3 for several years. It checks remote hosts and runs any number of predetermined commands and returns the results. It’s stable, highly configurable, and just keeps running.
I had a look at the replacement (net-mgmt/nrpe) and decided to build and install it. First, it went onto my poudriere package bulding host pkg01. That worked fine. Next I installed it on my Nagios server.
That did not go fine. Every check which used nrpe went red. with “(Return code of 127 is out of bounds – plugin may be missing)”.
I deinstalled nrpe and reinstalled nrpe3 and all Nagios checks returned to normal. I didn’t have time yesterday to figure out the cause.
The tweet
I did tweet about it. The solution came in about three hours later: use -3.
The server fix
I looked at my commands.cfg (usually in /usr/local/etc/nagios) and searched for the uses of nrpe. What I found gave me the ahah! moment.
command_line $USER1$/check_nrpe3 -H $HOSTADDRESS$ -c $ARG1$ -t 15
Of course, I bet check_nrpe3 becomes check_nrpe with the new package.
To prepare the fix, I took two copies of the file:
[13:35 webserver dan /usr/local/etc/nagiosql] % sudo cp commands.cfg commands.cfg.for.nrpe3 [13:36 webserver dan /usr/local/etc/nagiosql] % sudo cp commands.cfg commands.cfg.for.nrpe4
Notice I am not in the expected directory. I have reconfigured nagios because I’m using net-mgmt/nagiosql to configure Nagios, and it’s convenient to me to use a non-standard directory.
I took two copies so I could easily swap back and forth between the configurations should I need to revert again.
I modified commands.cfg.for.nrpe4, changing both the command and adding the -3 argument.
- command_line $USER1$/check_nrpe3 -H $HOSTADDRESS$ -c $ARG1$ -t 15 + command_line $USER1$/check_nrpe -3 -H $HOSTADDRESS$ -c $ARG1$ -t 15
I copied the file into place:
[13:37 webserver dan /usr/local/etc/nagiosql] % sudo cp commands.cfg.for.nrpe4 commands.cfg
Next, some magic to stop, install, and start nrpe:
[13:37 webserver dan /usr/local/etc/nagiosql] % sudo service nrpe3 stop && sudo pkg install -y nrpe && sudo service nrpe start
OK, time for a test check:
[13:38 webserver dan /usr/local/etc/nagiosql] % /usr/local/libexec/nagios/check_nrpe -3 -H dev-nginx01 -c check_var_mail All OK. No email found
Pick your own host and a command, if you like. Mine worked.
I ran ‘Schedule a check of all services on this host’ for my dev-nginx01 host. Everything went red. Again. WTF:
Oh wait, new commands mean a nagios reload:
[13:39 webserver dan ~] % sudo service nagios reload Performing sanity check of nagios configuration: OK
The client fix
Next, I went back to pkg01 and tried the new client there:
[14:09 pkg01 dan ~] % sudo service nrpe3 stop && sudo pkg install -y nrpe && sudo service nrpe start Stopping nrpe3. Waiting for PIDS: 63751. Updating local repository catalogue... [pkg01.int.unixathome.org] Fetching meta.conf: 100% 163 B 0.2kB/s 00:01 [pkg01.int.unixathome.org] Fetching packagesite.pkg: 100% 277 KiB 283.7kB/s 00:01 Processing entries: 100% local repository update completed. 1064 packages processed. All repositories are up to date. Checking integrity... done (1 conflicting) - nrpe-4.1.0 conflicts with nrpe3-3.2.1_1 on /usr/local/etc/nrpe.cfg.sample Checking integrity... done (0 conflicting) The following 2 package(s) will be affected (of 0 checked): Installed packages to be REMOVED: nrpe3: 3.2.1_1 New packages to be INSTALLED: nrpe: 4.1.0 Number of packages to be removed: 1 Number of packages to be installed: 1 [pkg01.int.unixathome.org] [1/2] Deinstalling nrpe3-3.2.1_1... [pkg01.int.unixathome.org] [1/2] Deleting files for nrpe3-3.2.1_1: 100% ==> You should manually remove the "nagios" user. ==> You should manually remove the "nagios" group [pkg01.int.unixathome.org] [2/2] Installing nrpe-4.1.0... ===> Creating groups. Using existing group 'nagios'. ===> Creating users Using existing user 'nagios'. ===> Creating homedir(s) [pkg01.int.unixathome.org] [2/2] Extracting nrpe-4.1.0: 100% You may need to manually remove /usr/local/etc/nrpe.cfg if it is no longer needed. ===== Message from nrpe-4.1.0: -- Enable NRPE in /etc/rc.conf with the following line: nrpe_enable="YES" A sample configuration is available in /usr/local/etc/nrpe.cfg.sample. Copy to nrpe.cfg where required and edit to suit your needs. Starting nrpe.
After running all the check again for that host, all green on nagios for that host.
One lingering problem
I noticed this on the client:
[14:09 pkg01 dan ~] % tail -F /var/log/messages May 5 14:10:10 pkg01 nrpe[35745]: Error: (use_ssl == true): Request packet version was invalid! May 5 14:10:10 pkg01 nrpe[35745]: Could not read request from client 2001:db8::ea2d, bailing out... May 5 14:10:10 pkg01 nrpe[35753]: Error: (use_ssl == true): Request packet version was invalid! May 5 14:10:10 pkg01 nrpe[35753]: Could not read request from client 2001:db8::ea2d, bailing out... May 5 14:10:10 pkg01 nrpe[35764]: Error: (use_ssl == true): Request packet version was invalid! May 5 14:10:10 pkg01 nrpe[35764]: Could not read request from client 2001:db8::ea2d, bailing out... May 5 14:10:10 pkg01 nrpe[35770]: Error: (use_ssl == true): Request packet version was invalid! May 5 14:10:10 pkg01 nrpe[35770]: Could not read request from client 2001:db8::ea2d, bailing out... May 5 14:10:10 pkg01 nrpe[35778]: Error: (use_ssl == true): Request packet version was invalid! May 5 14:10:10 pkg01 nrpe[35778]: Could not read request from client 2001:db8::ea2d, bailing out...
This post indicates it’s an expected problem. But that was 7 years ago. Still, I feel that’s a terrible decision. Let’s fix the error please.
Twitter suggested not using -3 on the server with a 4.x client. This changes the logged messages to:
May 5 15:32:14 pkg01 nrpe[98500]: SSL Not asking for client certification
Just one line now. I’d still like to get rid of that.
Found it
On May 6th, I tried commenting out this entry from /usr/local/etc/nrpe.cfg and restarting nrpe:
ssl_logging=0x20
I had changed this entry from the default value (0x00) while debugging earlier problems in this issue.
Trying a second client
I tried a second client:
[17:31 r730-01 dvl ~] % sudo service nrpe3 stop && sudo pkg install -y nrpe && sudo service nrpe start Stopping nrpe3. Waiting for PIDS: 40232 92918 92921 92924 92926 92931 92945 92948 92950 92955 92958. Updating local repository catalogue... local repository is up to date. All repositories are up to date. The following 1 package(s) will be affected (of 0 checked): New packages to be INSTALLED: nrpe: 4.1.0 Number of packages to be installed: 1 42 KiB to be downloaded. [1/1] Fetching nrpe-4.1.0.pkg: 100% 42 KiB 42.6kB/s 00:01 Checking integrity... done (1 conflicting) - nrpe-4.1.0 conflicts with nrpe3-3.2.1_1 on /usr/local/etc/nrpe.cfg.sample Checking integrity... done (0 conflicting) Conflicts with the existing packages have been found. One more solver iteration is needed to resolve them. The following 3 package(s) will be affected (of 0 checked): Installed packages to be REMOVED: nrpe3: 3.2.1_1 New packages to be INSTALLED: nrpe: 4.1.0 Installed packages to be REINSTALLED: pkg-1.19.1_1 Number of packages to be removed: 1 Number of packages to be installed: 1 Number of packages to be reinstalled: 1 [1/3] Deinstalling nrpe3-3.2.1_1... [1/3] Deleting files for nrpe3-3.2.1_1: 100% ==> You should manually remove the "nagios" user. ==> You should manually remove the "nagios" group [2/3] Installing nrpe-4.1.0... ===> Creating groups. Using existing group 'nagios'. ===> Creating users Using existing user 'nagios'. ===> Creating homedir(s) [2/3] Extracting nrpe-4.1.0: 100% [3/3] Reinstalling pkg-1.19.1_1... [3/3] Extracting pkg-1.19.1_1: 100% You may need to manually remove /usr/local/etc/nrpe.cfg if it is no longer needed. ===== Message from nrpe-4.1.0: -- Enable NRPE in /etc/rc.conf with the following line: nrpe_enable="YES" A sample configuration is available in /usr/local/etc/nrpe.cfg.sample. Copy to nrpe.cfg where required and edit to suit your needs. Cannot 'start' nrpe. Set nrpe_enable to YES in /etc/rc.conf or use 'onestart' instead of 'start'. [17:34 r730-01 dvl ~] % sudo service nrpe enable nrpe enabled in /etc/rc.conf [17:35 r730-01 dvl ~] % sudo service nrpe start Starting nrpe.
Ahh, I’m missing the enable from the magic sequence of commands.
The ansible script
With that success, I was ready to push this out everywhere.
I created this ansible script, available at https://git.langille.org/dvl/ansible/src/branch/main/nrpe.
NOTE: there is an error here. See below for details.
--- - hosts: all # want to do this: sudo service nrpe3 stop && sudo pkg install -y nrpe && sudo service nrpe enable && sudo service nrpe start tasks: - service: name: nrpe3 state: stopped - name: remove nrpe3 pkgng: name=nrpe3 state=absent - name: remove nrpe3_enabled command: "sysrc -x nrpe3_enable" - name: add nrpe pkgng: name=nrpe state=present - service: name: nrpe state: started
My thanks to those who helped with suggestion. Cheers.
Corrected ansible script
Today (2023-05-07) I noticed that nrpe was not enabled on a jail. I checked another jail. It wasn’t enabled on that jail either. The pkgng: name=nrpe state=present directive above does not enable the service.
I wrote a correcting script:
[12:38 ansible root /usr/local/etc/ansible] # cat nrpe-enable.yml --- - hosts: all tasks: - name: enable nrpe ansible.builtin.service: name: nrpe enabled: true [12:38 ansible root /usr/local/etc/ansible] #