It is time to replace my existing UPS with another one. I’m getting only 3 minutes of runtime with the existing batteries (and new batteries, after recalibration). It was suggested I buy an Eaton 5PX. I wasn’t convinced.
This is the first of three articles about nut. The second is about testing the shutdown. The third will be about testing both shutdown and startup timings.
Two days later, I’d purchased a new Eaton UPS and a battery pack. Please follow that suggested link above if you want to see my thought processes on that.
This post will be mostly about NUT (Network UPS Tools) which I will probably refer to as nut throughout this post mostly because the FreeBSD port is called nut.
The UPS replacement process (out with the old, in with the new) was live tweeted, with photographs.
In this post:
- FreeBSD 12.1
- nut 2.7.4
- Eaton 5PX: 5PX2200RT – 2U Line Interactive UPS
- Eaton EBM: 5PXEBM48RT – external battery pack
- pfSense 2.4.5-RELEASE-p1 (amd64)
If you notice any issues/problems with this post, please let me know and I’ll see what I can fix.
HEADS UP
The nut package includes devfs permissions to allow nut to talk to the UPS via the USB connection. These permissions are launched during boot. The easiest solution: after installing nut for the first time, make sure the USB cable is connected, and reboot.
Have a look at /usr/local/etc/devd/nut-usb.conf which contains everything nut needs.
Ignore all the posts about needing devfs permissions. That is no longer the case.
If you run this command and get similar output:
$ sudo /usr/local/libexec/nut/usbhid-ups -DDDDD -a ups02 Network UPS Tools - Generic HID driver 0.41 (2.7.4) USB communication driver 0.33 0.000000 send_to_all: SETINFO driver.parameter.port "auto" 0.000031 debug level is '5' 0.000614 upsdrv_initups... 0.000739 No appropriate HID device found 0.000753 No matching HID UPS found
… then you probably need to reboot.
You might find several permission issues on .conf files in /usr/local/etc/nut. I hope to get the package patched to avoid that.
My objectives
My objective with nut is to gracefully shutdown my servers when their batteries get too long to sustain power.
I also want to monitor nut and verify that everything is OK. I want to do that monitoring with Nagios. Nagios will only report status and will not take action. nut will take care of shutting down servers.
What is nut?
As reproduced from Network UPS Tools Overview
Network UPS Tools is a collection of programs which provide a common interface for monitoring and administering UPS, PDU and SCD hardware. It uses a layered approach to connect all of the parts.
Drivers are provided for a wide assortment of equipment. They understand the specific language of each device and map it back to a compatibility layer. This means both an expensive high end UPS, a simple “power strip” PDU, or any other power device can be handled transparently with a uniform management interface.
This information is cached by the network server upsd, which then answers queries from the clients. upsd contains a number of access control features to limit the abilities of the clients. Only authorized hosts may monitor or control your hardware if you wish. Since the notion of monitoring over the network is built into the software, you can hang many systems off one large UPS, and they will all shut down together. You can also use NUT to power on, off or cycle your data center nodes, individually or globally through PDU outlets.
The components
To learn more about the components, I recommend reading section 1 of NUT – Introduction to Network UPS Tools -Configuration Examples (pdf). Most of this section is taken directly from that.
These are the basic components of nut:
- Driver daemon – talks to the UPS hardware and is aware of the state of the UPS. Drivers share a command interface, upsdrvctl.
- upsd – a daemon which runs permanently in the box to which one or more UPS’s are attached. It scans the UPS using the UPS-specific driver and maintains an abstracted image of the UPS in memory. The various parts of the abstracted image have standardized names, and a key part is ups.status which gives the current status of the UPS unit.
- upsmon – a client of upsd. It runs permanently as a daemon on a local or remote box. It polls the status changes of the UPS unit. It is able to react to those state changes by emitting warning messages, or shutting down the box.
- span class=”cmd”>upsc – a simple utility program to talk to upsd and retrieve details of the UPS. For example, “What UPS are attached to the local host?”
[dan@slocum:~] $ upsc -l ups02 heartbeat [dan@slocum:~] $
That should get you started. I wish I’d read that PDF earlier in my adventure.
Compatibility
nut has a great interactive hardware compatibility tool. I went there and did this:
- Support Level – ignored
- Device type – Uninterruptible Power Supply
- Manufacturer – Eaton
- Model – 5 PX
- Connection – ignore
This shows me I can use either the serial port or the USB port. I decided on serial, which is the usbhid-ups driver.
Note the color of the Driver column and compare it to the Support level legend at the top of the page.
My configuration
I have four hosts at home and one UPS. The UPS will be attached to one host via a USB cable. That host is the primary. The others will be secondary and will depend upon the primary for information.
This is similar to how I use apcupsd.
This diagram, taken from nut Monitoring diagrams, outlines my approach.
My plan
I like monitoring. When I say that, I mean Nagios (or similar tools), not monitoring your UPS. I also like metrics, which I gather via snmp and LibreNMS.
It was a reply to my nut user mailing list post which prompted me to use a heartbeat solution for additional monitoring. It also allows me to use check_ups (a Nagios plugin, supplied by the net-mgmt/nagios-plugins FreeBSD port). I will also monitoring the pid files of upsd and upsmon on each secondary host.
pfSense/primary configuration
My pfSense host (bast) will be the primary. I’m going to show you screen shots and then show you the file contents. The files appear below and are in the same order as in the screen shot.
These are big screen shots. Please click on it for details.
The screen shots have redacted areas. For those configuration items, the examples are not what I am using but are similar.
Since pfSense is FreeBSD, you can just use the files instead of the GUI if your primary is FreeBSD.
Duplicate directives with different options can be ignored. It appears the pfSense UI supplies those and then takes the directives you have entered and appends them. Later directives override earlier directives and the expected outcome occurs.
ups.conf
# cat ups.conf [heartbeat] driver = dummy-ups port = heartbeat.dev desc = "Watch over NUT" [ups02] driver=usbhid-ups port=auto serial = G091C30079
Use port=auto for your USB drive.. You’ll see references to using /dev/ugen1.4 – that is outdated. From man usbhid-ups”
I needed the serial number when I had two UPS connected to this host. I wanted to get the Eaton, not the APC.
Useless trivia: when I was composing this section of the blog post, I noticed the “Additional configuration lines for ups.conf” box in the screen shot. I had not noticed it before. I had that content in the “Extra Arguments to driver” box, farther up the page. I adjusted the pfSense configuration and saved it. Rather than take a new screen shot (which would involve the external monitors), I used gimp to modify the screen shot.
upsmon.conf
# cat upsmon.conf MONITOR ups02 1 local-monitor password1 master SHUTDOWNCMD "/sbin/shutdown -p +0" POWERDOWNFLAG /etc/killpower NOTIFYCMD /usr/local/pkg/nut/nut_email.php NOTIFYFLAG ONLINE SYSLOG+WALL+EXEC NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC NOTIFYFLAG LOWBATT SYSLOG+WALL+EXEC NOTIFYFLAG FSD SYSLOG+WALL+EXEC NOTIFYFLAG COMMOK SYSLOG+WALL+EXEC NOTIFYFLAG COMMBAD SYSLOG+WALL+EXEC NOTIFYFLAG SHUTDOWN SYSLOG+WALL+EXEC NOTIFYFLAG REPLBATT SYSLOG+WALL+EXEC NOTIFYFLAG NOCOMM SYSLOG+WALL+EXEC NOTIFYFLAG NOPARENT SYSLOG+WALL+EXEC NOTIFYCMD /usr/local/sbin/upssched MONITOR heartbeat 0 upsmaster password2 master NOTIFYFLAG ONLINE EXEC NOTIFYFLAG ONBATT EXEC #NOTIFYCMD /usr/local/etc/nut/upssched-cmd SHUTDOWNCMD "/sbin/shutdown -h +0"
password1 is set and supplied by pfSense.
upsd.conf
$ cat upsd.conf LISTEN 127.0.0.1 LISTEN ::1 LISTEN 10.0.0.73 LISTEN 2001:db8::1
upsd.users
# cat upsd.users [admin] password=password3 actions=set instcmds=all [local-monitor] password=password1 upsmon master [dvl] password="password4" actions = set instcmds = ALL [local-heartbeat] password=password2 upsmon master [slocum] password=password5 upsmon slave [knew] password=password6 upsmon slave [r720-01] password=password7 upsmon slave
The [dvl] section is what allows me to use the upscmd to talk to the UPS from the command line. The “quotes” are optional but I added them because I had special characters in there.
upssched.conf
This file is not shown in the screen shot.
Yes, I agree /usr/local/etc/nut/upssched-cmd should be in /usr/local/bin.
# cat upssched.conf # Restart timer which completes only if the dummy-ups heart beat # has stopped. See timer values in heartbeat.dev CMDSCRIPT /usr/local/etc/nut/upssched-cmd PIPEFN /var/db/nut/upssched.pipe LOCKFN /var/db/nut/upssched.lock AT ONBATT heartbeat@localhost CANCEL-TIMER heartbeat-failure-timer AT ONBATT heartbeat@localhost START-TIMER heartbeat-failure-timer 660
upssched-cmd
This file is not shown in the screen shot.
# cat upssched-cmd #!/bin/sh # upssched-cmd for workstation with heartbeat logger -i -t upssched-cmd Calling upssched-cmd $1 # Send emails to/from these addresses EMAIL_TO="noc@example.org" EMAIL_FROM="upssched-cmd@ example.org" UPS="ups02" STATUS=$( upsc $UPS ups.status ) CHARGE=$( upsc $UPS battery.charge ) CHMSG="[$STATUS]:$CHARGE%" case $1 in online) MSG="$UPS, $CHMSG - power supply has been restored." ;; onbatt) MSG="$UPS, $CHMSG - power failure - save your work!" ;; lowbatt) MSG="$UPS, $CHMSG - shutdown now!" ;; heartbeat-failure-timer) MSG="NUT heart beat fails. $CHMSG" # Email to sysadmin MSG1="Hello, upssched-cmd reports NUT heartbeat has failed." MSG2="Current status: $CHMSG \n\n$0 $1" MSG3="\n\n$( ps -elf | grep -E 'ups[dms]|nut' )" # there can be no space after the -s argument. echo -e "$MSG1 $MSG2 $MSG3" | /usr/local/bin/mail.php -s"NUT heart beat fails. Currently $CHMSG" ;; *) logger -i -t upssched-cmd "Bad arg: \"$1\", $CHMSG" exit 1 ;; esac logger -i -t upssched-cmd $MSG
This script originated with the PDF.
Trouble getting it started?
After clicking save on the nut settings page, nut should start. If the settings page does not appear within 10-15 seconds, try starting again. If it fails again, go to the command line and search down and kill the nut/ups processes left running. This hit heartbeat.
This is the huge status page and what you can expect to see. Click on it to see details. I have added a red dot beside the more important items.
dummy auth
I do not know why dummy-ups does not need authorization to communicate with the primary.
FreeBSD/secondary configuration
This is the configuration I used for secondary hosts. You can also find it in this git repo.
These files appear in alphabetical order only because it was easier for me when running cat.
heartbeat.dev
This is identical the same file on the primary.
[dan@slocum:/usr/local/etc/nut] $ sudo cat heartbeat.dev # heartbeat.dev -- 10 minute heartbeat ups.status: OL TIMER 300 ups.status: OB TIMER 300
nut.conf
[dan@slocum:/usr/local/etc/nut] $ sudo cat nut.conf # see https://networkupstools.org/docs/man/nut.conf.html # or man nut.conf MODE=netclient
ups.conf
[dan@slocum:/usr/local/etc/nut] $ sudo cat ups.conf # see https://networkupstools.org/docs/man/dummy-ups.html # under repeater mode [ups02] driver = dummy-ups port = ups02@bast.example.org desc = "dummy-ups in repeater mode" [heartbeat] driver = dummy-ups port = heartbeat.dev desc = "Watch over NUT"
upsd.conf
On each of my secondary nut servers, I change the last two LISTEN directives to IP addresses which are local to that server.
[dan@slocum:/usr/local/etc/nut] $ sudo cat upsd.conf LISTEN 127.0.0.1 LISTEN ::1 LISTEN 10.55.0.73 LISTEN 2001:db8:2
upsd.users
The password field is different on each secondary and matches the on in upsmon.conf.
[dan@slocum:/usr/local/etc/nut] $ sudo cat upsd.users [local-heartbeat] password=heartpass upsmon slave
upsmon.conf
password5 matches the password found in upsd.users on the primary.
heartpass matches the value found in upsd.users on this server.
[dan@slocum:/usr/local/etc/nut] $ sudo cat upsmon.conf MONITOR ups02@bast.example.org 1 slocum password5 slave SHUTDOWNCMD "/sbin/shutdown -h +0" MONITOR heartbeat 0 upsmaster heartpass master NOTIFYFLAG ONLINE EXEC NOTIFYFLAG ONBATT EXEC NOTIFYCMD /usr/local/sbin/upssched
upssched-cmd
This is NEARLY identical the same file on the primary. The difference is in the mail command.
[dan@slocum:/usr/local/etc/nut] $ cat upssched-cmd #!/bin/sh # upssched-cmd for workstation with heartbeat logger -i -t upssched-cmd Calling upssched-cmd $1 # Send emails to/from these addresses EMAIL_TO="noc@example.org" EMAIL_FROM="upssched-cmd@ example.org" UPS="ups02" STATUS=$( upsc $UPS ups.status ) CHARGE=$( upsc $UPS battery.charge ) CHMSG="[$STATUS]:$CHARGE%" case $1 in online) MSG="$UPS, $CHMSG - power supply has been restored." ;; onbatt) MSG="$UPS, $CHMSG - power failure - save your work!" ;; lowbatt) MSG="$UPS, $CHMSG - shutdown now!" ;; heartbeat-failure-timer) MSG="NUT heart beat fails. $CHMSG" # Email to sysadmin MSG1="Hello, upssched-cmd reports NUT heartbeat has failed." MSG2="Current status: $CHMSG \n\n$0 $1" MSG3="\n\n$( ps -elf | grep -E 'ups[dms]|nut' )" echo -e "$MSG1 $MSG2 $MSG3" | /usr/bin/mail -s "NUT heart beat fails. Currently $CHMSG" ;; *) logger -i -t upssched-cmd "Bad arg: \"$1\", $CHMSG" exit 1 ;; esac logger -i -t upssched-cmd $MSG
upssched.conf
This is identical the same file on the primary.
[dan@slocum:/usr/local/etc/nut] $ sudo cat upssched.conf # Restart timer which completes only if the dummy-ups heart beat # has stopped. See timer values in heartbeat.dev CMDSCRIPT /usr/local/etc/nut/upssched-cmd PIPEFN /var/db/nut/upssched.pipe LOCKFN /var/db/nut/upssched.lock AT ONBATT heartbeat@localhost CANCEL-TIMER heartbeat-failure-timer AT ONBATT heartbeat@localhost START-TIMER heartbeat-failure-timer 660
FreeBSD enable, start, get info
To enable nut, I did this:
$ sudo sysrc nut_upsmon_enable="YES" nut_enable="YES"
Yes, you can specify multiple configuration items on one line.
Now, let’s start, nut first.
[dan@slocum:~] $ sudo service nut start Network UPS Tools - UPS driver controller 2.7.4 Network UPS Tools - Device simulation and repeater driver 0.14 (2.7.4) Network UPS Tools - Device simulation and repeater driver 0.14 (2.7.4) Starting nut. Network UPS Tools upsd 2.7.4 fopen /var/db/nut/upsd.pid: No such file or directory listening on 2001:db8:2 port 3493 listening on 10.55.0.73 port 3493 listening on ::1 port 3493 listening on 127.0.0.1 port 3493 Connected to UPS [heartbeat]: dummy-ups-heartbeat Connected to UPS [ups02]: dummy-ups-ups02 [dan@slocum:~] $
I don’t know why that error occurs:
[dan@slocum:~] $ sudo ls -l /var/db/nut/upsd.pid -rw-r--r-- 1 uucp uucp 6 Sep 9 22:19 /var/db/nut/upsd.pid [dan@slocum:~] $
Next, upsmon:
https://dan.langille.org/?p=6140&preview=true [dan@slocum:~] $ sudo service nut_upsmon start Starting nut_upsmon. Network UPS Tools upsmon 2.7.4 kill: No such process UPS: ups02@bast.int.unixathome.org (slave) (power value 1) UPS: heartbeat (monitoring only) [dan@slocum:~] $
This is what I have running now:
[dan@slocum:~] $ ps auwwx | grep ups uucp 64577 0.0 0.0 11600 2936 - Ss 22:19 0:00.32 /usr/local/libexec/nut/dummy-ups -a ups02 uucp 64589 0.0 0.0 11464 2888 - Ss 22:19 0:00.01 /usr/local/libexec/nut/dummy-ups -a heartbeat uucp 64591 0.0 0.0 110004 2940 - Ss 22:19 0:00.02 /usr/local/sbin/upsd root 78899 0.0 0.0 11312 2852 - Is 22:25 0:00.00 /usr/local/sbin/upsmon localhost uucp 78900 0.0 0.0 11576 2880 - S 22:25 0:00.00 /usr/local/sbin/upsmon localhost
Two dummy-ups, one for each upsmon, the primary, and the heartbeat. upsd talks to both dummy-ups instances.
Let’s get some information from the UPS:
[dan@slocum:~] $ upsc ups02 battery.capacity: 9.00 battery.charge: 100 battery.charge.low: 20 battery.charge.restart: 0 battery.charger.status: resting battery.energysave: no battery.protection: yes battery.runtime: 1941 battery.type: PbAc device.mfr: EATON device.model: Eaton 5PX 2200 device.serial: G091C30079 device.type: ups driver.name: dummy-ups driver.parameter.mode: repeater driver.parameter.pollinterval: 2 driver.parameter.port: ups02@bast.int.unixathome.org driver.parameter.synchronous: no driver.version: 2.7.4 driver.version.internal: 0.14 input.current: 0.00 input.frequency: 59.9 input.frequency.extended: no input.frequency.nominal: 60 input.sensitivity: normal input.transfer.boost.low: 102 input.transfer.high: 151 input.transfer.low: 89 input.transfer.trim.high: 132 input.voltage: 116.7 input.voltage.extended: no input.voltage.nominal: 120 outlet.1.autoswitch.charge.low: 0 outlet.1.current: 3.70 outlet.1.delay.shutdown: 65535 outlet.1.delay.start: 3 outlet.1.desc: PowerShare Outlet 1 outlet.1.id: 1 outlet.1.power: 432 outlet.1.powerfactor: 94.00 outlet.1.realpower: 410 outlet.1.status: on outlet.1.switchable: yes outlet.2.autoswitch.charge.low: 0 outlet.2.current: 5.50 outlet.2.delay.shutdown: 65535 outlet.2.delay.start: 6 outlet.2.desc: PowerShare Outlet 2 outlet.2.id: 2 outlet.2.power: 630 outlet.2.powerfactor: 97.00 outlet.2.realpower: 616 outlet.2.status: on outlet.2.switchable: yes outlet.current: 0.00 outlet.desc: Main Outlet outlet.id: 0 outlet.power: 0 outlet.powerfactor: 0.00 outlet.realpower: 0 outlet.switchable: no output.current: 9.10 output.frequency: 59.9 output.frequency.nominal: 60 output.powerfactor: 0.96 output.voltage: 116.7 output.voltage.nominal: 120 ups.beeper.status: enabled ups.delay.shutdown: 20 ups.delay.start: 30 ups.efficiency: 97 ups.firmware: 06 ups.load: 52 ups.load.high: 105 ups.mfr: EATON ups.model: Eaton 5PX 2200 ups.power: 1062 ups.power.nominal: 2200 ups.productid: ffff ups.realpower: 1030 ups.realpower.nominal: 1980 ups.serial: G091C30079 ups.shutdown: enabled ups.start.auto: yes ups.start.battery: yes ups.start.reboot: yes ups.status: OL ups.test.interval: 604800 ups.test.result: Done and passed ups.timer.shutdown: 0 ups.timer.start: 0 ups.type: offline / line interactive ups.vendorid: 0463 [dan@slocum:~] $
Next, from heartbeat:
[dan@slocum:~] $ upsc heartbeat device.mfr: Dummy Manufacturer device.model: Dummy UPS device.type: ups driver.name: dummy-ups driver.parameter.mode: dummy driver.parameter.pollinterval: 2 driver.parameter.port: heartbeat.dev driver.parameter.synchronous: no driver.version: 2.7.4 driver.version.internal: 0.14 ups.mfr: Dummy Manufacturer ups.model: Dummy UPS ups.status: OL [dan@slocum:~] $
Nagios monitoring
Here is a simple Nagios check:
[dan@webserver:~] $ /usr/local/libexec/nagios/check_ups -H slocum -u ups02 UPS OK - Status=Online Utility=117.3V Batt=100.0% Load=50.0% Left=33.1min|voltage=117.300000;;;0.000000 battery=100.000000%;;;0.000000;100.000000 load=50.000000%;;;0.000000;100.000000 left=33.066667;;;0.000000 [dan@webserver:~] $
We can check heartbeat too:
[dan@webserver:~] $ /usr/local/libexec/nagios/check_ups -H slocum -u heartbeat UPS OK - Status=Online |
That first check has been added to Nagios, for the primary and for each secondary.
I think that if you are enabling a service, that service should be monitored. It pays to be alerted to a problem before you notice the problem.
The second check should not be added. It constantly alternates between UPS WARNING – Status=On Battery and UPS OK – Status=Online, by design. You don’t want that. I don’t either.
Customized nut port
I am using a customized version of the port. I noticed permission issues and a missing syslog.d directory which I wanted to fix. I’ll push this patch upstream soon.
Index: pkg-plist =================================================================== --- pkg-plist (revision 547408) +++ pkg-plist (working copy) @@ -1,10 +1,10 @@ %%NUT_CGI%%%%CGIDIR%%/upsimage.cgi %%NUT_CGI%%%%CGIDIR%%/upsset.cgi %%NUT_CGI%%%%CGIDIR%%/upsstats.cgi -%%NUT_CGI%%@sample %%CGIETCDIR%%hosts.conf.sample -%%NUT_CGI%%@sample %%CGIETCDIR%%upsset.conf.sample -%%NUT_CGI%%@sample %%CGIETCDIR%%upsstats.html.sample -%%NUT_CGI%%@sample %%CGIETCDIR%%upsstats-single.html.sample +%%NUT_CGI%%@sample(root,uucp,0640) %%CGIETCDIR%%hosts.conf.sample +%%NUT_CGI%%@sample(root,uucp,0640) %%CGIETCDIR%%upsset.conf.sample +%%NUT_CGI%%@sample(root,uucp,0640) %%CGIETCDIR%%upsstats.html.sample +%%NUT_CGI%%@sample(root,uucp,0640) %%CGIETCDIR%%upsstats-single.html.sample %%NUT_CGI%%%%WWWDIR%%/bottom.html %%NUT_CGI%%%%WWWDIR%%/header.html %%NUT_CGI%%%%WWWDIR%%/index.html @@ -11,12 +11,12 @@ %%NUT_CGI%%%%WWWDIR%%/nut-banner.png %%ETCDIR%%/cmdvartab %%ETCDIR%%/driver.list -@sample %%ETCDIR%%/nut.conf.sample -@sample %%ETCDIR%%/ups.conf.sample -@sample %%ETCDIR%%/upsd.conf.sample -@sample %%ETCDIR%%/upsd.users.sample -@sample %%ETCDIR%%/upsmon.conf.sample -@sample %%ETCDIR%%/upssched.conf.sample +@sample(root,uucp,0640) %%ETCDIR%%/nut.conf.sample +@sample(root,uucp,0640) %%ETCDIR%%/ups.conf.sample +@sample(root,uucp,0640) %%ETCDIR%%/upsd.conf.sample +@sample(root,uucp,0640) %%ETCDIR%%/upsd.users.sample +@sample(root,uucp,0640) %%ETCDIR%%/upsmon.conf.sample +@sample(root,uucp,0640) %%ETCDIR%%/upssched.conf.sample @sample %%EXAMPLESDIR%%/newsyslog.sample etc/newsyslog.conf.d/nut.conf @sample %%EXAMPLESDIR%%/syslog.sample etc/syslog.d/nut %%NUT_USB%%etc/devd/nut-usb.conf @@ -247,6 +247,7 @@ sbin/upsdrvctl sbin/upsmon sbin/upssched +@dir etc/syslog.d @dir(%%NUT_USER%%,%%NUT_GROUP%%,750) %%STATEDIR%% @dir libexec/nut @dir(%%NUT_USER%%,%%NUT_GROUP%%,) /var/log/nut [dan@pkg01:/usr/local/poudriere/ports/default/sysutils/nut] $
Things to notice
If you stop the primary, you’ll see messages like this on the secondary:
upsmon[46271]: Poll UPS [ups02@bast.example.org] failed - Server disconnected upsmon[46271]: Communications with UPS ups02@bast.example.org lost upsd[75566]: Data for UPS [ups02] is stale - check driver
And this wall broadcast:
Broadcast Message from dan@slocum.example.org (no tty) at 0:02 UTC... Communications with UPS ups02@bast.example.org lost
A bit later:
upsmon[46271]: UPS [ups02@bast.example.org]: connect failed: Connection failure: Operation timed out upsd[75566]: Data for UPS [ups02] is stale - check driver
After starting the master, I saw:
Sep 5 00:05:28 slocum upsmon[46271]: Login on UPS [ups02@bast.example.org] failed - got [ERR ACCESS-DENIED] Sep 5 00:05:33 slocum upsmon[46271]: Poll UPS [ups02@bast.example.org] failed - Driver not connected Sep 5 00:05:48 slocum syslogd: last message repeated 3 times Sep 5 00:05:49 slocum upsd[75566]: UPS [ups02] data is no longer stale Sep 5 00:05:53 slocum upsmon[46271]: Poll UPS [ups02@bast.example.org] failed - Driver not connected Sep 5 00:06:28 slocum syslogd: last message repeated 7 times Sep 5 00:07:14 slocum syslogd: last message repeated 9 times Sep 5 00:07:14 slocum upsmon[46271]: UPS ups02@bast.example.org is unavailable Broadcast Message from dan@slocum.int.unixathome.org (no tty) at 0:07 UTC... UPS ups02@bast.int.unixathome.org is unavailable
This is a good time to say: sometimes the master does not start. I went back to the settings page, and hit save again.
Then I saw:
Sep 5 00:07:19 slocum upsmon[46271]: Poll UPS [ups02@bast.example.org] failed - Driver not connected Sep 5 00:07:24 slocum syslogd: last message repeated 1 times Sep 5 00:07:29 slocum upsmon[46271]: Poll UPS [ups02@bast.example.org] failed - Driver not connected Sep 5 00:08:04 slocum syslogd: last message repeated 7 times Sep 5 00:08:09 slocum upsmon[46271]: Poll UPS [ups02@bast.example.org] failed - Server disconnected Sep 5 00:08:14 slocum upsmon[46271]: Login on UPS [ups02@bast.example.org] failed - got [ERR ACCESS-DENIED] Sep 5 00:08:19 slocum upsmon[46271]: Poll UPS [ups02@bast.example.org] failed - Data stale Sep 5 00:08:24 slocum upsmon[46271]: Communications with UPS ups02@bast.example.org established Broadcast Message from dan@slocum.int.unixathome.org (no tty) at 0:08 UTC... Communications with UPS ups02@bast.int.unixathome.org established
That’s it
It’s working. It needs to be tested. I hope to have that blog post ready soon.
I will also be timing shutdown and startup, so I can make sure everything comes up. I especially want the PostgreSQL database server online and ready to go.