Nagios was telling me:
FILE_AGE WARNING: /usr/home/rsyncer/backups/aws-1/postgresql/freshports.org.dump is 117636 seconds old and 3608113799 bytes
That means the FreshPorts backup is more than a day old, and it should have been refreshed by now.
OK, let’s go look.
I log into the host known as aws-1 and check the files. They look fresh to me:
[rsyncer@aws-1 ~/backups/database-backup/postgresql]$ ls -lt total 3525154 -rw-r--r-- 1 rsyncer rsyncer 3963 Jul 16 02:14 globals.sql drwxr-xr-x 2 rsyncer rsyncer 4 Jul 16 02:14 archive -rw-r--r-- 1 rsyncer rsyncer 3610129659 Jul 16 02:14 freshports.org.dump
I don’t see any rsync in progress (I checked using ps auwwx | grep rysnc).
Let’s try this manually
Looking at my backup script, the last thing I see is:
/usr/bin/ssh -i ${HOME}/.ssh/id_ed25519.dump dbclone.int.unixathome.org rsync-backup-from-aws-1-dump.sh
That’s the rsyncer user telling the dbclone to run the rsync-backup-from-aws-1-dump.sh script. More on that later.
Let’s try this manually:
[rsyncer@aws-1 ~]$ /usr/bin/ssh -i ${HOME}/.ssh/id_ed25519.dump dbclone.int.unixathome.org rsync-backup-from-aws-1-dump.sh bash: line 1: /usr/local/sbin/rrsync: cannot execute: required file not found rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: remote command not found (code 127) at io.c(231) [Receiver=3.3.0]
What?
Let’s go to dbclone and check ~/.ssh/authorized_keys:
from="aws-1.startpoint.vpn.unixathome.org,192.0.2.130",command="/home/rsyncer/bin/rsync-backup-from-aws-1-dump.sh" ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOuHMg9eBDIM3UXRzMW861DEfvXfwnZvRMC6LCz3dCcS rsyncer@aws-1.int.unixathome.org
This says:
- When the connection is from aws-1.startpoint.vpn.unixathome.org,192.0.2.130…
- invoke the script /home/rsyncer/bin/rsync-backup-from-aws-1-dump.sh
- And allow this to happen only if the key they are using is the one with that hash shown
This tightly restricts the incoming ssh connection to one thing, and one thing only.
The script
Let’s look at the script:
[rsyncer@dbclone ~]$ cat /home/rsyncer/bin/rsync-backup-from-aws-1-dump.sh #!/bin/sh # # This file does a backup of each database on the server. # It relies upon a file on the server to do the actual backup, # then uses rsync to copy the files from the server to here. # # the ~/.ssh/authorized keys entry on the server must look like this: # # from="192.0.2.140",command="/usr/local/sbin/rrsync /usr/home/backups/" [ssh key goes here] # # invoking rrsync ensures the incoming rsync process is chrooted to /usr/home/dan/backups/ # Thus, BACKUPDIR is relative to that location. BACKUPDIR=${HOME}/backups/aws-1 IDENTITY_FILE_RSYNC=${HOME}/.ssh/id_ed25519 SERVER_TO_RSYNC=aws-1.vpn.unixathome.org cd ${BACKUPDIR} # # use rsync to get local copies # /usr/local/bin/rsync -e "/usr/bin/ssh -i ${IDENTITY_FILE_RSYNC}" --recursive -av --stats --progress --exclude 'archive' ${SERVER_TO_RSYNC}:/ ${BACKUPDIR}
This script invokes rsync over ssh, effectively pulling it from aws-1 to dbclone.
To recap
So far:
- The rsyncer user on aws-1 run pg_dump
- rsyncer invokes ssh to dbclone
- On dbclone, an rsync is started, pulling the file to dbclone
- Yes, this is circular, but there is sound logic behind it. More, later.
So far, so good.
Running the command
Let’s try running the script and see what happens here.
Here, I manually invoke each line of the script to see what’s going on:
[rsyncer@dbclone ~]$ /bin/sh $ BACKUPDIR=${HOME}/backups/aws-1 $ $ IDENTITY_FILE_RSYNC=${HOME}/.ssh/id_ed25519 $ SERVER_TO_RSYNC=aws-1.vpn.unixathome.org $ $ cd ${BACKUPDIR} $ /usr/local/bin/rsync -e "/usr/bin/ssh -i ${IDENTITY_FILE_RSYNC}" --recursive -av --stats --progress --exclude 'archive' ${SERVER_TO_RSYNC}:/ ${BACKUPDIR} bash: line 1: /usr/local/sbin/rrsync: cannot execute: required file not found rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(231) [Receiver=3.3.0]
Oh. WTF?
Now let’s look at the command being run back on aws-1.
The aws-1 command
Back on aws-1, we have this ~/.ssh/authorized_keys entry:
from="192.0.2,191.0.2.140",command="/usr/local/sbin/rrsync /home/rsyncer/backups/database-backup" ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILcVvbs7FiGm1rmo5OxroXDlY7cytxh4Su8ZYz2u/TB2 rsyncer@dbclone.int.unixathome.org
Why the IP address twice? This host cannot resolve that IP address to a hostname. Although I see now there is an entry in /etc/hosts for that IP address.
This authorized_keys entry is similar to what we saw earlier.
By now, and I should have suspected this earlier, I’m looking at /usr/local/sbin/rrsync. Let me see:
$ ls -l /usr/local/sbin/rrsync -r-xr-xr-x 1 root wheel 12865 Apr 28 04:18 /usr/local/sbin/rrsync
Yeah, it’s there. Looking:
[rsyncer@aws-1 ~]$ file /usr/local/sbin/rrsync /usr/local/sbin/rrsync: Python script, ASCII text executable [rsyncer@aws-1 ~]$ head /usr/local/sbin/rrsync #!/usr/local/bin/python3.9 # Restricts rsync to subdirectory declared in .ssh/authorized_keys. See # the rrsync man page for details of how to make use of this script. # NOTE: install python3 braceexpand to support brace expansion in the args! # Originally a perl script by: Joe Smith30-Sep-2004 # Python version by: Wayne Davison [rsyncer@aws-1 ~]$
Yeah, well, it’s python.
And I just updated Python on this host
I recently moved from Python 3.9 to Python 3.11 on this host:
[rsyncer@aws-1 ~]$ ls -l /usr/local/bin/python3.9 ls: /usr/local/bin/python3.9: No such file or directory [rsyncer@aws-1 ~]$ pkg info -x python py311-python-dateutil-2.9.0 python-3.11_3,2 python3-3_4 python311-3.11.9_1
Will reinstall bring in the right header?
[12:43 aws-1 dan ~] % sudo pkg install -f rsync Updating local repository catalogue... Fetching meta.conf: 100% 178 B 0.2kB/s 00:01 Fetching data.pkg: 100% 305 KiB 312.7kB/s 00:01 Processing entries: 100% local repository update completed. 1123 packages processed. All repositories are up to date. The following 1 package(s) will be affected (of 0 checked): Installed packages to be REINSTALLED: rsync-3.3.0 Number of packages to be reinstalled: 1 385 KiB to be downloaded. Proceed with this action? [y/N]: y [1/1] Fetching rsync-3.3.0.pkg: 100% 385 KiB 394.2kB/s 00:01 Checking integrity... done (0 conflicting) [1/1] Reinstalling rsync-3.3.0... [1/1] Extracting rsync-3.3.0: 100% [12:43 aws-1 dan ~] % head 1 head /usr/local/sbin/rrsync head: 1: No such file or directory head: head: No such file or directory ==> /usr/local/sbin/rrsync <== #!/usr/local/bin/python3.11 # Restricts rsync to subdirectory declared in .ssh/authorized_keys. See # the rrsync man page for details of how to make use of this script. # NOTE: install python3 braceexpand to support brace expansion in the args! # Originally a perl script by: Joe Smith30-Sep-2004 # Python version by: Wayne Davison [12:44 aws-1 dan ~] %
Success!
Run it!
Here we go!
[rsyncer@aws-1 ~/bin]$ /usr/bin/ssh -i ${HOME}/.ssh/id_ed25519.dump dbclone.int.unixathome.org rsync-backup-from-aws-1-dump.sh receiving incremental file list postgresql/
That’s still in progress.
Talk about a better shebang
So why not use /usr/local/bin/python instead of /usr/local/bin/python3.9
Good question.
That fix is handled by shebangfix and this line:
SHEBANG_FILES= support/rrsync
… in net/rsync/Makefile
I’m not sure why it works the way it does.
Why ssh, then rsync
You might ask:
Why do the rsync from the destination (pull) and not from the source (push).
This is to protect the destination, at home, in my basement. If the source gets compromised, all they can do is run another rsync.
So why not run the rsync from the destination anyway? Why invoke it from the source?
Timing.
I want the rsync run when the backup is done. The simple way to know when it’s done is to let the remote host run it right after the dump.
It’s done
While typing the above, it finished:
[rsyncer@aws-1 ~/bin]$ /usr/bin/ssh -i ${HOME}/.ssh/id_ed25519.dump dbclone.int.unixathome.org rsync-backup-from-aws-1-dump.sh receiving incremental file list postgresql/ postgresql/freshports.org.dump 3,610,129,659 100% 21.45MB/s 0:02:40 (xfr#1, to-chk=1/4) postgresql/globals.sql 3,963 100% 4.88kB/s 0:00:00 (xfr#2, to-chk=0/4) Number of files: 4 (reg: 2, dir: 2) Number of created files: 0 Number of deleted files: 0 Number of regular files transferred: 2 Total file size: 3,610,133,622 bytes Total transferred file size: 3,610,133,622 bytes Literal data: 349,726,356 bytes Matched data: 3,260,407,266 bytes File list size: 141 File list generation time: 0.001 seconds File list transfer time: 0.000 seconds Total bytes sent: 480,719 Total bytes received: 350,029,118 sent 480,719 bytes received 350,029,118 bytes 2,080,177.07 bytes/sec total size is 3,610,133,622 speedup is 10.30 [rsyncer@aws-1 ~/bin]$
More importantly, Nagios is also happy: ** RECOVERY alert – dbclone/check_backup_files PRIMARY is OK **
Thank you for coming to my TED talk.