Skip to the end of this post for the lesson part of this blog post.
This email arrived in my inbox yesterday at about 10:00 PM:
From: Cron Daemon <rsyncer@pg02.int.unixathome.org> To: dan@langille.org Subject: Cron <rsyncer@pg02> ~/bin/backup.sh > /dev/null X-Cron-Env: <SHELL=/bin/sh> X-Cron-Env: <MAILTO=dan@langille.org> X-Cron-Env: <PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/usr/X11R6/bin:$HOME/bin; export PATH> X-Cron-Env: <LOGNAME=rsyncer> X-Cron-Env: <USER=rsyncer> Date: Mon, 16 Oct 2023 02:02:31 +0000 Message-Id: <652c99b7.554c3.70a41a23@pg02.int.unixathome.org> pg_dump: error: query failed: ERROR: permission denied for sequence cache_clearing_files_id_seq pg_dump: error: query was: SELECT last_value, is_called FROM public.cache_clearing_files_id_seq
This is the backup script for my database dumps on my server at home.
I immediately recognized it as the follow-on from a table I had just added. It was late, I was headed to bed. I forgot about it.
Until this morning.
This morning
This morning, Nagios was telling me:
/usr/home/rsyncer/backups/pg02/database-backup/postgresql/freshports.devgit.dump has size zero. /usr/home/rsyncer/backups/aws-1 has size zero.
Ahh, that’s because the dump failed. The files are empty.
There was another email:
From: Cron Daemon <dan@dbclone.int.unixathome.org> To: dan@langille.org Subject: Cron <dan@dbclone> /usr/bin/lockf -t 0 /tmp/.dan.lockf.test-backups-all.sh ${HOME}/bin/test-backups-all.sh | grep -v 'ERROR: language "plpgsql" already exists' X-Cron-Env: <SHELL=/bin/sh> X-Cron-Env: <MAILTO=dan@langille.org> X-Cron-Env: <PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/usr/X11R6/bin:$HOME/bin; export PATH> X-Cron-Env: <LOGNAME=dan> X-Cron-Env: <USER=dan> Date: Mon, 16 Oct 2023 10:02:46 +0000 Message-Id: <652d0a46.124fa7.17dd2530@dbclone.int.unixathome.org> pg_restore: error: input file is too short (read 0, expected 5)
This was the database dump testing script failing because one of the dump files was empty. This script loads up each database dump into a server to make sure it can load without errors.
The solution
The solution was to re-invoke these permissions against each of the databases which contained this new table:
[13:13 pg02 dan ~] % psql freshports.devgit psql (12.16) Type "help" for help. freshports.devgit=# grant select on all tables in schema public to rsyncer; grant select on all sequences in schema public to rsyncer; GRANT GRANT freshports.devgit=# \c freshports.stagegit You are now connected to database "freshports.stagegit" as user "dan". freshports.stagegit=# grant select on all tables in schema public to rsyncer; grant select on all sequences in schema public to rsyncer; GRANT GRANT freshports.stagegit=# \c freshports.testgit You are now connected to database "freshports.testgit" as user "dan". freshports.testgit=# grant select on all tables in schema public to rsyncer; grant select on all sequences in schema public to rsyncer; GRANT GRANT freshports.testgit=#
The lesson
If you have a script doing something, make sure any errors get sent to you for later action. Yes, I did not act on the first email.
If you have a result you are expecting, monitor that result. In my case, the monitoring was looking for zero-length database dumps.
Having multiple ways to check/monitor a desired result means you will know about the problem earlier and have more time to fix it. In my case, I could have avoided the Nagios alerts and the failed database test if I had acted upon the first email.