I figured out why pg_dump was failing with PostgreSQL 15-16

In recent blog post, I outlined a problem I hit with pg_dump. Specifically, pg_dump was picking up and using ~/.pgpass with pg_dump from PostgreSQL 12-14, but with PostgreSQL 15-16, it was failing.

In this blog post:

  • FreeBSD 13.2
  • PostgreSQL server 12 / 16
  • PostgreSQL client 12-16
  • Bacula 9.6.7

Today we figured out why: $HOME.

$HOME for the script was set to /

In PostgreSQL < 15, the code used the database to determine HOME and then pick up ~/.pgpass

Stumbling through the environment variables

In PostgreSQL 16, the code first checks $HOME (as shown in this commit).

The order is now:

  2. $HOME/.pgpass (starting with v15)
  3. ~/.pgpass (i.e. the home directory from /etc/passwd)

I stumbled across the problem this morning when pg_dump wasn’t running with a 16 client against a 16 server.

The output was:

22-Nov 13:36 bacula-dir JobId 361250: Start Backup JobId 361250, Job=BackupCatalog.2023-11-22_13.36.38_39
22-Nov 13:36 bacula-dir JobId 361250: There are no more Jobs associated with Volume "FullAutoNoNextPool-04-17758". Marking it purged.
22-Nov 13:36 bacula-dir JobId 361250: All records pruned from Volume "FullAutoNoNextPool-04-17758"; marking it "Purged"
22-Nov 13:36 bacula-dir JobId 361250: Recycled volume "FullAutoNoNextPool-04-17758"
22-Nov 13:36 bacula-dir JobId 361250: Using Device "vDrive-FullFileNoNextPool-0" to write.
22-Nov 13:36 dbclone-fd JobId 361250: shell command: run ClientRunBeforeJob "/usr/local/bacula/dump_catalog.sh"
22-Nov 13:36 dbclone-fd JobId 361250: ClientRunBeforeJob: Password: 
22-Nov 13:36 dbclone-fd JobId 361250: ClientRunBeforeJob: pg_dump: error: connection to server at "pg03.int.unixathome.org" (, port 5432 failed: fe_sendauth: no password supplied
22-Nov 13:36 bacula-sd-04 JobId 361250: Recycled volume "FullAutoNoNextPool-04-17758" on File device "vDrive-FullFileNoNextPool-0" (/usr/local/bacula/volumes/FullFileNoNextPool), all previous data lost.
22-Nov 13:36 bacula-dir JobId 361250: Max Volume jobs=1 exceeded. Marking Volume "FullAutoNoNextPool-04-17758" as Used.
22-Nov 13:36 bacula-sd-04 JobId 361250: Elapsed time=00:00:01, Transfer rate=154  Bytes/second

Line 7 was the clue. I had removed –no-password from my script (added for testing).

However, /root/.pgpass existed, had the correct permissions, and the appropriate content:

[14:48 dbclone dan ~] % ls -l /root/.pgpass
-rw-------  1 root  wheel  216 2023.11.21 13:18 /root/.pgpass
[14:49 dbclone dan ~] % sudo cat /root/.pgpass                    

# used by bacula-fd, which runs as root
# used in /usr/local/bacula/dump_catalog.sh
*:*:bacula:bacula:[also redacted]
[14:49 dbclone dan ~] %   

The script (see this blog post for content)

I started by adding more debugging output to the script.

echo host=${HOST}
echo db=${DB}
echo user=${USER}
echo HOME=${HOME}
ls -l /root/.pgpass
file /root/.pgpass

This provided the following output:

22-Nov 14:34 dbclone-fd JobId 361259: shell command: run ClientRunBeforeJob "/usr/local/bacula/dump_catalog.sh"
22-Nov 14:34 dbclone-fd JobId 361259: ClientRunBeforeJob: host=pg02.int.unixathome.org
22-Nov 14:34 dbclone-fd JobId 361259: ClientRunBeforeJob: db=bacula
22-Nov 14:34 dbclone-fd JobId 361259: ClientRunBeforeJob: user=bacula
22-Nov 14:34 dbclone-fd JobId 361259: ClientRunBeforeJob: uid=0(root) gid=0(wheel) groups=0(wheel),5(operator)
22-Nov 14:34 dbclone-fd JobId 361259: ClientRunBeforeJob: HOME=/
22-Nov 14:34 dbclone-fd JobId 361259: ClientRunBeforeJob: -rw-------  1 root  wheel  216 Nov 21 13:18 /root/.pgpass
22-Nov 14:34 dbclone-fd JobId 361259: ClientRunBeforeJob: /root/.pgpass: ASCII text

That was my first revelation that ${HOME} was not what it should be. ilmari pointed out: oh, the getenv(“HOME”) was added in pg15

Here’s what I tried next in the script:

echo HOME=${HOME}
export HOME=/root
echo HOME=${HOME}

With that change, the pg_dump worked.

The next test was:

echo HOME=${HOME}
#export HOME=/root
echo HOME=${HOME}
export PGPASSFILE=/root/.pgpass
ls -l /root/.pgpass
file /root/.pgpass

Specifically, I commented out the changing of HOME and added the setting of PGPASSFILE. This change also allowed the pg_dump to proceed.

NOTE: We had tried PGPASSFILE in a test yesterday, but I see it did not involve an export directive, so it was essentially an invalid test. That variable would not be passed to any subprocesses (i.e. pg_dump in this case) as is the case with EXPORTed variable.

In summary

In summary:

  1. the script was being launched with $HOME set to /
  2. pg_dump for PostgreSQL 12-14 was using the password database to locate HOME for the running user
  3. pg_dump for PostgreSQL 15-16 was looking at HOME, seeing it set, using it, therefore not finding a .pgpass in the expected location

See also: https://why-upgrade.depesz.com/show?from=12&to=16&keywords=home+directory

Action points: Upgrade Bacula to a newer release, see if the HOME directory changes.

EDIT 2023-11-23 – It has been reported that the HOME directory on Bacula 13 is /root.

EDIT 2023-11-23 – #2 – It is a unguarded secret that FreeBSD starts daemons with / as a HOME.

[16:27 empty dan /etc] % grep -i home /usr/sbin/service
		exec env -i -L -/daemon HOME=/ PATH=/sbin:/bin:/usr/sbin:/usr/bin "$dir/$script" "$@"

The HOME value is obtained by / passed to bacula-fd when it is started and is not modified by Bacula.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top