In recent blog post, I outlined a problem I hit with pg_dump. Specifically, pg_dump was picking up and using ~/.pgpass with pg_dump from PostgreSQL 12-14, but with PostgreSQL 15-16, it was failing.
In this blog post:
- FreeBSD 13.2
- PostgreSQL server 12 / 16
- PostgreSQL client 12-16
- Bacula 9.6.7
Today we figured out why: $HOME.
$HOME for the script was set to /
In PostgreSQL < 15, the code used the
Stumbling through the environment variables
In PostgreSQL 16, the code first checks $HOME (as shown in this commit).
The order is now:
- PGPASSFILE
- $HOME/.pgpass (starting with v15)
- ~/.pgpass (i.e. the home directory from /etc/passwd)
I stumbled across the problem this morning when pg_dump wasn’t running with a 16 client against a 16 server.
The output was:
22-Nov 13:36 bacula-dir JobId 361250: Start Backup JobId 361250, Job=BackupCatalog.2023-11-22_13.36.38_39 22-Nov 13:36 bacula-dir JobId 361250: There are no more Jobs associated with Volume "FullAutoNoNextPool-04-17758". Marking it purged. 22-Nov 13:36 bacula-dir JobId 361250: All records pruned from Volume "FullAutoNoNextPool-04-17758"; marking it "Purged" 22-Nov 13:36 bacula-dir JobId 361250: Recycled volume "FullAutoNoNextPool-04-17758" 22-Nov 13:36 bacula-dir JobId 361250: Using Device "vDrive-FullFileNoNextPool-0" to write. 22-Nov 13:36 dbclone-fd JobId 361250: shell command: run ClientRunBeforeJob "/usr/local/bacula/dump_catalog.sh" 22-Nov 13:36 dbclone-fd JobId 361250: ClientRunBeforeJob: Password: 22-Nov 13:36 dbclone-fd JobId 361250: ClientRunBeforeJob: pg_dump: error: connection to server at "pg03.int.unixathome.org" (10.55.0.34), port 5432 failed: fe_sendauth: no password supplied 22-Nov 13:36 bacula-sd-04 JobId 361250: Recycled volume "FullAutoNoNextPool-04-17758" on File device "vDrive-FullFileNoNextPool-0" (/usr/local/bacula/volumes/FullFileNoNextPool), all previous data lost. 22-Nov 13:36 bacula-dir JobId 361250: Max Volume jobs=1 exceeded. Marking Volume "FullAutoNoNextPool-04-17758" as Used. 22-Nov 13:36 bacula-sd-04 JobId 361250: Elapsed time=00:00:01, Transfer rate=154 Bytes/second
Line 7 was the clue. I had removed –no-password from my script (added for testing).
However, /root/.pgpass existed, had the correct permissions, and the appropriate content:
[14:48 dbclone dan ~] % ls -l /root/.pgpass -rw------- 1 root wheel 216 2023.11.21 13:18 /root/.pgpass [14:49 dbclone dan ~] % sudo cat /root/.pgpass # # used by bacula-fd, which runs as root # used in /usr/local/bacula/dump_catalog.sh #hostname:port:database:username:password #pg02.int.unixathome.org:5432:bacula:bacula:[redacted] *:*:bacula:bacula:[also redacted] [14:49 dbclone dan ~] %
The script (see this blog post for content)
I started by adding more debugging output to the script.
echo host=${HOST} echo db=${DB} echo user=${USER} id echo HOME=${HOME} ls -l /root/.pgpass file /root/.pgpass
This provided the following output:
22-Nov 14:34 dbclone-fd JobId 361259: shell command: run ClientRunBeforeJob "/usr/local/bacula/dump_catalog.sh" 22-Nov 14:34 dbclone-fd JobId 361259: ClientRunBeforeJob: host=pg02.int.unixathome.org 22-Nov 14:34 dbclone-fd JobId 361259: ClientRunBeforeJob: db=bacula 22-Nov 14:34 dbclone-fd JobId 361259: ClientRunBeforeJob: user=bacula 22-Nov 14:34 dbclone-fd JobId 361259: ClientRunBeforeJob: uid=0(root) gid=0(wheel) groups=0(wheel),5(operator) 22-Nov 14:34 dbclone-fd JobId 361259: ClientRunBeforeJob: HOME=/ 22-Nov 14:34 dbclone-fd JobId 361259: ClientRunBeforeJob: -rw------- 1 root wheel 216 Nov 21 13:18 /root/.pgpass 22-Nov 14:34 dbclone-fd JobId 361259: ClientRunBeforeJob: /root/.pgpass: ASCII text
That was my first revelation that ${HOME} was not what it should be. ilmari pointed out: oh, the getenv(“HOME”) was added in pg15
Here’s what I tried next in the script:
echo HOME=${HOME} export HOME=/root echo HOME=${HOME}
With that change, the pg_dump worked.
The next test was:
id echo HOME=${HOME} #export HOME=/root echo HOME=${HOME} export PGPASSFILE=/root/.pgpass ls -l /root/.pgpass file /root/.pgpass
Specifically, I commented out the changing of HOME and added the setting of PGPASSFILE. This change also allowed the pg_dump to proceed.
NOTE: We had tried PGPASSFILE in a test yesterday, but I see it did not involve an export directive, so it was essentially an invalid test. That variable would not be passed to any subprocesses (i.e. pg_dump in this case) as is the case with EXPORTed variable.
In summary
In summary:
- the script was being launched with $HOME set to /
- pg_dump for PostgreSQL 12-14 was using the password database to locate HOME for the running user
- pg_dump for PostgreSQL 15-16 was looking at HOME, seeing it set, using it, therefore not finding a .pgpass in the expected location
See also: https://why-upgrade.depesz.com/show?from=12&to=16&keywords=home+directory
Action points: Upgrade Bacula to a newer release, see if the HOME directory changes.
EDIT 2023-11-23 – It has been reported that the HOME directory on Bacula 13 is /root.
EDIT 2023-11-23 – #2 – It is a unguarded secret that FreeBSD starts daemons with / as a HOME.
[16:27 empty dan /etc] % grep -i home /usr/sbin/service exec env -i -L -/daemon HOME=/ PATH=/sbin:/bin:/usr/sbin:/usr/bin "$dir/$script" "$@"
The HOME value is obtained by / passed to bacula-fd when it is started and is not modified by Bacula.