Lately, I’ve been preparing to move from PostgreSQL 12 to PostgreSQL 16 – my main developement database server at home needs to get updated. The goal: migrate each database from the old host (running PostgreSQL 12) to the new host (running PostgreSQL 16) using pg_dump and pg_restore.
Today, I decided to migrate the Bacula database. I ran a pg_dump on pg02 today. Then a pg_restore on pg03. That took about 6 hours I think. There’s the database, ready to use:
[21:30 pg03 dan ~rsyncer] % psql -l bacula List of databases Name | Owner | Encoding | Locale Provider | Collate | Ctype | ICU Locale | ICU Rules | Access privileges ------------------------+----------+-----------+-----------------+---------+---------+------------+-----------+----------------------- bacula | postgres | SQL_ASCII | libc | C | C.UTF-8 | | | empty | postgres | UTF8 | libc | C | C.UTF-8 | | | fpphorum | postgres | SQL_ASCII | libc | C | C.UTF-8 | | | freebsddiary.old | postgres | SQL_ASCII | libc | C | C.UTF-8 | | | freebsddiary.org | postgres | SQL_ASCII | libc | C | C.UTF-8 | | | freshports.devgit | postgres | SQL_ASCII | libc | C | C.UTF-8 | | | freshports.mobile | postgres | SQL_ASCII | libc | C | C.UTF-8 | | | freshports.stagegit | postgres | SQL_ASCII | libc | C | C.UTF-8 | | | freshports.testgit | postgres | SQL_ASCII | libc | C | C.UTF-8 | | | freshports.testgit.old | postgres | SQL_ASCII | libc | C | C.UTF-8 | | | fsphorum | postgres | SQL_ASCII | libc | C | C.UTF-8 | | | gitea | postgres | UTF8 | libc | C | C.UTF-8 | | | nagiostest | postgres | UTF8 | libc | C | C.UTF-8 | | | postgres | postgres | UTF8 | libc | C | C.UTF-8 | | | postgresqleu | postgres | UTF8 | libc | C | C.UTF-8 | | | samdrucker | postgres | UTF8 | libc | C | C.UTF-8 | | | template0 | postgres | UTF8 | libc | C | C.UTF-8 | | | =c/postgres + | | | | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | libc | C | C.UTF-8 | | | =c/postgres + | | | | | | | | postgres=CTc/postgres (18 rows) [21:30 pg03 dan ~rsyncer] %
Let’s run a Bacula test
Let’s run a backup, see how it goes. I opened /usr/local/etc/bacula/bacula-dir.conf and found the reference to the Catalog. This is what I saw:
# Generic catalog service Catalog { Name = MyCatalog dbname = bacula; dbaddress = pg03.int.unixathome.org; user = bacula; password = "[redacted]" # MultipleConnections = yes }
Oh shit.
Bacula was configured to use the database server I just migrated to. Any data in that database has no been lost. I’ve “lost” the media data about the backups (but not the backups themselves).
Confirming the date
When did this change occur?
[21:02 bacula dan /usr/local/etc/bacula] % ls -l bacula-dir.conf -rw-r----- 1 root bacula 8015 2023.11.20 13:49 bacula-dir.conf
That’s yesterday at about 9 AM local time.
Damn. What this means is the Bacula database on pg03 had all the backups from last night. I then overwrote that with the database from pg02, which did not have that data.
Options to get the data back
To be clear, the backed up data is still there. It is just not listed in the Catalog (which records what was backed up, when, and where it is now).
That meta data (missing from the Catalog) can be recovered using bscan, a command line utility. From the documentation:
If you find yourself using this program, you have probably done something wrong. For example, the best way to recover a lost or damaged Bacula database is to reload the database by using the bootstrap file that was written when you saved it (default bacula-dir.conf file).
The bscan program can be used to re-create a database (catalog) records from the backup information written to one or more Volumes. This is normally needed only if one or more Volumes have been pruned or purged from your catalog so that the records on the Volume are no longer in the catalog, or for Volumes that you have archived. Note, if you scan in Volumes that were previously purged, you will be able to do restores from those Volumes. However, unless you modify the Job and File retention times for the Jobs that were added by scanning, the next time you run any backup Job with the same name, the records will be pruned again. Since it takes a long time to scan Volumes this can be very frustrating.
This is something which works, but it not fun to do. It is absolutely a last resort situation.
But wait, there’s more
How much data am I missing?
What’s the last job in the pg03 database (which at this point, matches pg02, because no further new jobs have been run):
bacula=# select * from job order by jobid desc limit 1; jobid | job | name | type | level | clientid | jobstatus | schedtime | starttime | endtime | jobtdate | volsessionid | volsessiontime | jobfiles | jobbytes | joberrors | jobmissingfiles | poolid | filesetid | purgedfiles | hasbase | realendtime | priorjobid | readbytes | hascache | reviewed | comment --------+--------------------------------------+---------------+------+-------+----------+-----------+---------------------+---------------------+---------+------------+--------------+----------------+----------+----------+-----------+-----------------+--------+-----------+-------------+---------+-------------+------------+-----------+----------+----------+--------- 361222 | BackupCatalog.2023-11-21_13.18.08_22 | BackupCatalog | B | F | 32 | R | 2023-11-21 13:18:08 | 2023-11-21 13:18:10 | | 1700572690 | 0 | 0 | 0 | 0 | 0 | 0 | 41 | 269 | 0 | 0 | | | 0 | 0 | 0 | (1 row)
Today? OK, that looks like a test I ran today. Let’s check a few more.
bacula=# select * from job order by jobid desc limit 10; jobid | job | name | type | level | clientid | jobstatus | schedtime | starttime | endtime | jobtdate | volsessionid | volsessiontime | jobfiles | jobbytes | joberrors | jobmissingfiles | poolid | filesetid | purgedfiles | hasbase | realendtime | priorjobid | readbytes | hascache | reviewed | comment --------+---------------------------------------------------------------------+----------------------------------------------+------+-------+----------+-----------+---------------------+---------------------+---------------------+------------+--------------+----------------+----------+------------+-----------+-----------------+--------+-----------+-------------+---------+---------------------+------------+------------+----------+----------+--------- 361222 | BackupCatalog.2023-11-21_13.18.08_22 | BackupCatalog | B | F | 32 | R | 2023-11-21 13:18:08 | 2023-11-21 13:18:10 | | 1700572690 | 0 | 0 | 0 | 0 | 0 | 0 | 41 | 269 | 0 | 0 | | | 0 | 0 | 0 | 361221 | BackupCatalog.2023-11-21_13.17.42_21 | BackupCatalog | B | F | 32 | T | 2023-11-21 13:17:42 | 2023-11-21 13:17:44 | 2023-11-21 13:17:45 | 1700572665 | 364 | 1699489736 | 1 | 0 | 0 | 0 | 41 | 269 | 0 | 0 | 2023-11-21 13:17:45 | 0 | 0 | 0 | 0 | 361220 | BackupCatalog.2023-11-21_13.17.08_20 | BackupCatalog | B | F | 32 | T | 2023-11-21 13:17:08 | 2023-11-21 13:17:11 | 2023-11-21 13:17:11 | 1700572631 | 363 | 1699489736 | 1 | 0 | 0 | 0 | 41 | 269 | 0 | 0 | 2023-11-21 13:17:11 | 0 | 0 | 0 | 0 | 361219 | BackupCatalog.2023-11-21_03.05.03_15 | BackupCatalog | B | I | 32 | T | 2023-11-21 03:05:03 | 2023-11-21 04:29:22 | 2023-11-21 04:29:22 | 1700540962 | 362 | 1699489736 | 1 | 0 | 0 | 0 | 41 | 269 | 0 | 0 | 2023-11-21 04:29:22 | 0 | 0 | 0 | 0 | 361218 | zuul_jail_snapshots.2023-11-21_03.05.03_14 | zuul jail snapshots | B | I | 55 | T | 2023-11-21 03:05:03 | 2023-11-21 03:09:23 | 2023-11-21 03:41:53 | 1700538113 | 355 | 1699489736 | 3825 | 7230834138 | 0 | 0 | 40 | 254 | 0 | 0 | 2023-11-21 03:41:53 | 0 | 7230834138 | 0 | 0 | 361217 | zuul_basic.2023-11-21_03.05.02_13 | zuul basic | B | I | 55 | T | 2023-11-21 03:05:02 | 2023-11-21 03:08:51 | 2023-11-21 03:09:27 | 1700536167 | 352 | 1699489736 | 83 | 9752592 | 0 | 0 | 40 | 298 | 0 | 0 | 2023-11-21 03:09:27 | 0 | 9752592 | 0 | 0 | 361216 | x8dtu_jail_snapshots.2023-11-21_03.05.02_12 | x8dtu jail snapshots | B | I | 59 | T | 2023-11-21 03:05:02 | 2023-11-21 03:06:09 | 2023-11-21 03:12:17 | 1700536337 | 351 | 1699489736 | 318 | 989590316 | 0 | 0 | 40 | 276 | 0 | 0 | 2023-11-21 03:12:17 | 0 | 989590316 | 0 | 0 | 361215 | x8dtu_basic.2023-11-21_03.05.02_11 | x8dtu basic | B | I | 59 | T | 2023-11-21 03:05:02 | 2023-11-21 03:07:32 | 2023-11-21 03:09:18 | 1700536158 | 349 | 1699489736 | 36 | 3212169 | 0 | 0 | 40 | 298 | 0 | 0 | 2023-11-21 03:09:18 | 0 | 3212169 | 0 | 0 | 361214 | tallboy_Papers_Jail_PostgreSQL_Configuration.2023-11-21_03.05.02_10 | tallboy Papers Jail PostgreSQL Configuration | B | I | 39 | T | 2023-11-21 03:05:02 | 2023-11-21 03:07:58 | 2023-11-21 03:09:16 | 1700536156 | 348 | 1699489736 | 0 | 0 | 0 | 0 | 40 | 286 | 0 | 0 | 2023-11-21 03:09:16 | 0 | 0 | 0 | 0 | 361213 | tallboy_Papers_Jail.2023-11-21_03.05.02_09 | tallboy Papers Jail | B | I | 39 | T | 2023-11-21 03:05:02 | 2023-11-21 03:07:30 | 2023-11-21 03:09:15 | 1700536155 | 350 | 1699489736 | 0 | 0 | 0 | 0 | 40 | 284 | 0 | 0 | 2023-11-21 03:09:15 | 0 | 0 | 0 | 0 | (10 rows) bacula=#
Wait, what? How can that be? How are the backups from last night in there?
Check the source
Remember when I checked the timestamp on the file? It was early yesterday morning. So how can here be backups there from early today?
By this time I figured, check bacula-dir:
[21:02 bacula dan /usr/local/etc/bacula] % ps auwwx | grep bacula-dir bacula 22712 0.0 0.0 215680 71836 - IsJ 8Nov23 1:43.65 /usr/local/sbin/bacula-dir -u bacula -g bacula -v -c /usr/local/etc/bacula/bacula-dir.conf dan 81762 0.0 0.0 12812 2360 11 S+J 21:25 0:00.00 grep bacula-dir
BONUS!
Although I changed the configuration file to point at the new database server, I [successfully] failed to restart bacula-dir.
Nothing has been lost. It’s been using pg02 all along.
Wow.
What I did next
I have just:
- ameneded pg_hba.conf on pg02 so Bacula can no longer connect to that database
- restarted bacula-dir and confirmed it is using pg03
- ran a small backup job
- Confirmed that new job was in the pg03 database
Ooops.
The next day, no backup
The next day, the Catalog backup had failed. Expectedly, had I thought.
22-Nov 04:40 dbclone-fd JobId 361249: ClientRunBeforeJob: pg_dump: error: server version: 16.1; pg_dump version: 14.10 22-Nov 04:40 dbclone-fd JobId 361249: ClientRunBeforeJob: pg_dump: error: aborting because of server version mismatch
The solution: move the dbclone host onto PostgreSQL 16.
[13:15 dbclone dan ~] % sudo pkg install postgresql16-server postgresql16-client postgresql16-contrib postgresql16-plperl Updating local repository catalogue... local repository is up to date. All repositories are up to date. Checking integrity... done (5 conflicting) - postgresql16-server-16.1_1 conflicts with postgresql14-server-14.10_1 on /usr/local/bin/initdb - postgresql16-client-16.1 conflicts with postgresql14-client-14.10 on /usr/local/bin/clusterdb - postgresql16-client-16.1 conflicts with postgresql14-client-14.10 on /usr/local/bin/clusterdb - postgresql16-plperl-16.1 conflicts with postgresql14-plperl-14.10 on /usr/local/include/postgresql/server/plperl.h - postgresql16-contrib-16.1 conflicts with postgresql14-contrib-14.10 on /usr/local/bin/oid2name Checking integrity... done (0 conflicting) The following 11 package(s) will be affected (of 0 checked): Installed packages to be REMOVED: postgresql14-client: 14.10 postgresql14-contrib: 14.10 postgresql14-plperl: 14.10 postgresql14-server: 14.10_1 New packages to be INSTALLED: postgresql16-client: 16.1 postgresql16-contrib: 16.1 postgresql16-plperl: 16.1 postgresql16-server: 16.1_1 Installed packages to be REINSTALLED: nagios-check_postgres-2.26.0 (direct dependency changed: postgresql16-client) nagios-plugins-2.4.4,1 (direct dependency changed: postgresql16-client) pkg-1.20.9 Number of packages to be removed: 4 Number of packages to be installed: 4 Number of packages to be reinstalled: 3 The process will require 9 MiB more space. Proceed with this action? [y/N]: y