FYI, this problem has been solved and described in I figured out why pg_dump was failing with PostgreSQL 15-16.
This morning I encountered this error message:
20-Nov 03:25 dbclone-fd JobId 361156: ClientRunBeforeJob: pg_dump: error: connection to server at "pg02.int.example.org" (10.55.0.32), port 5432 failed: fe_sendauth: no password supplied
In this post:
- FreeBSD 13.2
- PostgreSQL 12.16 (server – pg02)
- PostgreSQL 16.1 (client – dbclone)
The backstory:
- dbclone dumps the bacula database hosted on pg02.
- The script to do this is: /usr/local/bacula/dump_catalog.sh
- The host, user, and database name are stored in
- The password is stored in /root/.pgass
/usr/local/bacula/config.sh
This script ran correctly on Saturday night. On Sunday night, it failed.
On Sunday, I upgraded the PostgreSQL client on dbclone from postgresql12-client-12.17 to postgresql16-client-16.1.
The script
[17:38 dbclone dan ~] % sudo cat /usr/local/bacula/dump_catalog.sh #!/bin/sh CONF=/usr/local/bacula/config.sh DIR=/usr/home/rsyncer/backups/bacula-database/postgresql if [ -r $CONF ] then . $CONF else echo Configurtion file does not exist: $CONF exit 3 fi #/usr/local/bin/pg_dump -Z0 -Fc --host ${HOST} --username ${USER} ${DB} --no-password > ${DIR}/${DB}.dump.tmp /usr/local/bin/pg_dump "sslmode=require host=${HOST} user=${USER} dbname=${DB}" --no-password -Z0 -Fc > ${DIR}/${DB}.dump.tmp DUMP_RESULT=$? if [ ${DUMP_RESULT} != 0 ]; then exit ${DUMP_RESULT} fi mv ${DIR}/${DB}.dump.tmp ${DIR}/${DB}.dump # allow rsyncer users to read this file /usr/sbin/chown rsyncer:rsyncer ${DIR}/${DB}.dump
The commented out pg_dump is what was there before the error. The new invocation using sslmode was added today while working on this issue.
The configuration file
The configuration file contains:
[17:39 dbclone dan ~] % cat /usr/local/bacula/config.sh #!/bin/sh # this is the configuration file for other scripts in this directory USER=bacula HOST=pg02.int.example.org DB=bacula
.pgpass
Here is the .pgpass file in question:
[17:50 dbclone dan ~] % sudo cat ~root/.pgpass # # used by bacula-fd, which runs as root # used in /usr/local/bacula/dump_catalog.sh #hostname:port:database:username:password pg02.int.example.org:5432:bacula:bacula:not_real_password *:*:bacula:bacula:not_real_password
That pg02 line was added while writing this post. It didn’t help.
Running manually
Running the script manually succeeds:
[17:54 dbclone dan ~] % sudo /usr/local/bacula/dump_catalog.sh ^Cpg_dump: terminated by user
If I comment out the line in /root/.pgpass, I get the error mentioned above:
[17:55 dbclone dan ~] % sudo /usr/local/bacula/dump_catalog.sh pg_dump: error: connection to server at "pg02.int.unixathome.org" (10.55.0.32), port 5432 failed: fe_sendauth: no password supplied
My idea: when running the script, it can’t read .pgpass, or it’s reading another .pgpass, not the one we’re using here.
To test that theory, I added this line to the script:
logger $(id)
When running the script now, this appears in /var/log/messages:
Nov 20 17:55:18 dbclone dan[42045]: uid=0(root) gid=0(wheel) groups=0(wheel),5(operator)
Interesting that dan gets into that output. I have no idea why that occurs.
So far
So far I have confirmed that the script, when running via sudo is running as root and is picking up the value from /root/.pgpass file.
Running from Bacula
Again, this Bacula job was running fine two days ago. From what I can tell, the only difference is the Bacula client package.
When I run the Bacula job, we get the same log entry. And this error message:
20-Nov 18:06 dbclone-fd JobId 361173: ClientRunBeforeJob: pg_dump: error: connection to server at "pg02.int.unixathome.org" (10.55.0.32), port 5432 failed: fe_sendauth: no password supplied
Installing the old client
Installing postgresql12-client-12.17 results in a successful backup.
Also, version 13 and 14 work. Version 15 and 16 do not.
We went through a lot of checks today. Still no idea why this fails.
At present, I’m considering using PostgreSQL 14 instead of 16.
Conclusions
Testing shows there is a difference between pg_dump supplied with PostgreSQL 12-14 when compared with pg_dump from PostgreSQL 15 and 16 – the precise problem has not been tracked down.
However, after a walk, I’ve figured out that this is not a blocker for me. The primary objective is to replace an instance of PostgreSQL 12 with PostgreSQL 16. This will be done by migrating the databases from the existing database server (pg02) to a new database server running v16 (pg03).
Why was I trying to dump a 12 database with a 16 client?
dbclone is a backup testing host. I takes pg_dump output and runs it through pg_restore, loading it up into a clean PostgreSQL database. It alerts me to any restore problems, before the restore is needed. Because of that, dbclone needs to be running the most recent version of PostgreSQL I have installed (e.g. don’t dump a 16 database with 12 tools; always use the version that matches [or exceeds] the database you are dumping).
There will be a time period when both 12 (pg02) and 16 (pg03) are in concurrent use. For that time period, dbclone will be running PostgreSQL 14 client (mostly because that’s what’s on there now). It will only take one day to migrate the Bacula database. When completed, dbclone will be updated to the 16 client (which presumably will just work with a 16 server on pg03). That initial dump is happening right now.
My thanks to everyone who helped me me work through this.
EDIT 2023-11-22
No, the 16 client does not help when working with a 16 server. The problem persists today:
22-Nov 13:36 bacula-dir JobId 361250: Start Backup JobId 361250, Job=BackupCatalog.2023-11-22_13.36.38_39 22-Nov 13:36 bacula-dir JobId 361250: There are no more Jobs associated with Volume "FullAutoNoNextPool-04-17758". Marking it purged. 22-Nov 13:36 bacula-dir JobId 361250: All records pruned from Volume "FullAutoNoNextPool-04-17758"; marking it "Purged" 22-Nov 13:36 bacula-dir JobId 361250: Recycled volume "FullAutoNoNextPool-04-17758" 22-Nov 13:36 bacula-dir JobId 361250: Using Device "vDrive-FullFileNoNextPool-0" to write. 22-Nov 13:36 dbclone-fd JobId 361250: shell command: run ClientRunBeforeJob "/usr/local/bacula/dump_catalog.sh" 22-Nov 13:36 dbclone-fd JobId 361250: ClientRunBeforeJob: Password: 22-Nov 13:36 dbclone-fd JobId 361250: ClientRunBeforeJob: pg_dump: error: connection to server at "pg03.int.unixathome.org" (10.55.0.34), port 5432 failed: fe_sendauth: no password supplied 22-Nov 13:36 bacula-sd-04 JobId 361250: Recycled volume "FullAutoNoNextPool-04-17758" on File device "vDrive-FullFileNoNextPool-0" (/usr/local/bacula/volumes/FullFileNoNextPool), all previous data lost. 22-Nov 13:36 bacula-dir JobId 361250: Max Volume jobs=1 exceeded. Marking Volume "FullAutoNoNextPool-04-17758" as Used. 22-Nov 13:36 bacula-sd-04 JobId 361250: Elapsed time=00:00:01, Transfer rate=154 Bytes/second
I’m very sure that’s a prompt for a password right there.
However, the script is running as root:
Nov 22 13:36:41 dbclone dan[35045]: uid=0(root) gid=0(wheel) groups=0(wheel),5(operator)
And root has:
[13:40 dbclone dan ~/tmp] % ls -l /root/.pgpass -rw------- 1 root wheel 216 2023.11.21 13:18 /root/.pgpass
Which contains:
[13:43 dbclone dan ~/tmp] % sudo cat /root/.pgpass # # used by bacula-fd, which runs as root # used in /usr/local/bacula/dump_catalog.sh #hostname:port:database:username:password #pg02.int.unixathome.org:5432:bacula:bacula:[redacted] *:*:bacula:bacula:[redacted]
The mystery and frustration continues. For those who claim this is a Bacula-only issue, I want to point out: something must have changed with PostgreSQL 15 – I went through the process where the only variable is changing the client.
Running this from within a cronjob succeeds:
[13:53 dbclone dan /usr/local/etc/cron.d] % cat backup-bacula 51 * * * * root /usr/local/bacula/dump_catalog.sh
FYI, see the conclusion.
When I did upgrade PostgreSQL from 9.6 to 13 I had to adjust my standalone backup (dump) script which used ‘pg_dumpall -o’. I had to remove the -o option, as it is not supported any more with 13. Unfortunately I am unable to find any reference what -o was, and if it got removed in 13 or even before. But maybe it points you into the right direction to figure out what is wrong in your case.
See also (just posted now): https://dan.langille.org/2023/11/22/i-figured-out-why-pg_dump-was-failing-with-postgresql-15-16/
-o is for OIDs: https://www.postgresql.org/docs/9.6/app-pg-dumpall.html
See the bottom of https://www.postgresql.org/docs/ where there is a link to the archive.