I want to copy the latest backups for each Bacula client over to a new host. I already do this, on an automated basis, for recent backups, primarily to have the same backups in two different places. In this case, I want to do this once.
This post might also be a good introduction to copying / moving jobs based on SQL queries. See also Bacula – copy to tape by job size and Moving Bacula Volumes from one Storage to Another.
Background
If you are familiar with Bacula and copy jobs, you can skip this.
Bacula is a client-server backup suite of programs. I’ve used it since at least 2004. The major components are:
- Director – bacula-dir
- Storage Daemon – bacula-sd
- File Daemon (client) – bacula-fd
The Director knows everything, and is the central component. The SD stores the backups. It could be tape, disk, cloud, etc.
The client (bacula-fd) runs on the host you want to backup. The Director contacts the FD, gives it a list of files to backup, and where to send it (SD). The SD receives the files and stores them away.
Overview – why the move?
In this post:
- FreeBSD 13.2-RELEASE-p2
- Bacula 9.6.7 (yes, I am behind on my upgrades here)
- bacula-sd-01 – the SD source of the backups to copy
- knew – the server upon which bacula-sd-01 runs in a FreeBSD jail
- bacula-sd-04 – the SD to which the backups are going to be copied
- r730-03 – the server on which the bacula-sd-04 jail is hosted
- FullFile – the pool from which the backups will be copied – this pool resides within bacula-sd-01
- FullFile-04 – the pool into which the backups will be copied. This is hosted on bacula-sd-04
I am retiring a server which hosts a bacula-sd. This host contains 20 x 5 TB drives on a FreeBSD host. The drives are arranged in a ZFS zpool (known as system) as shown here:
[18:15 knew dan ~] % zpool list system NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT system 90.5T 65.2T 25.3T - - 17% 72% 1.00x ONLINE -
It has about 65TB used out of 90TB. Most of that is backups. The oldest of which dates back to 2004-09-01. I have the space. I used it.
The main zpool (system) looks like this:
[18:15 knew dan ~] % zpool status system pool: system state: ONLINE scan: scrub repaired 0B in 18:00:54 with 0 errors on Fri Aug 18 21:16:05 2023 config: NAME STATE READ WRITE CKSUM system ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da3p3 ONLINE 0 0 0 da10p3 ONLINE 0 0 0 da9p3 ONLINE 0 0 0 da2p3 ONLINE 0 0 0 da13p3 ONLINE 0 0 0 da15p3 ONLINE 0 0 0 da11p3 ONLINE 0 0 0 da14p3 ONLINE 0 0 0 da8p3 ONLINE 0 0 0 da7p3 ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 da5p1 ONLINE 0 0 0 da6p1 ONLINE 0 0 0 da19p1 ONLINE 0 0 0 da12p1 ONLINE 0 0 0 da4p1 ONLINE 0 0 0 da1p1 ONLINE 0 0 0 da22p1 ONLINE 0 0 0 da16p1 ONLINE 0 0 0 da0p1 ONLINE 0 0 0 da18p1 ONLINE 0 0 0 errors: No known data errors
Twenty drives generate a bunch of heat and noise.
I’m downsizing to a smaller server with 4 x 12 TB drives which will give me 24TB of space. The goal is to reduce that noise, cool off the basement a little, and reduce my energy consumption.
These are the recent posts about that move:
- Identifying jails and data to migrade from old host to new host
- Bacula – calculating Maximum Volume Bytes and Maximum Volumes based on historical data
- Testing two 12TB drives to a Dell R730 on FreeBSD
The goal
My current goal: take the most recent full backups for each client and copy them over to the new SD. I will also copy the last 12 months of full backups.
I’ll go the steps I carried out showing you how it evolved.
The clients
The first goal: get a list of backup jobs. You might say, why not a list of clients (hosts)? Well, the jobs are unique to the clients, but clients have multiple jobs. The goal is to get all the jobs, which will by definition, cover all the clients.
I started with a query I had in place for duplicating monthly full backups. I amended it for this situation and came up with:
SELECT DISTINCT J.JobId, J.StartTime, J.jobbytes, pg_size_pretty(J.JobBytes), J.name FROM Job J, Pool P WHERE P.Name IN ('FullFile', 'MonthlyBackups') AND P.PoolId = J.PoolId AND J.Type = 'B' AND J.JobStatus IN ('T','W') AND J.jobBytes > 0 AND J.JobId NOT IN (SELECT J.PriorJobId FROM Job J WHERE J.Type IN ('B','C') AND J.JobStatus IN ('T','W') AND J.PriorJobId != 0 AND J.PoolId IN (SELECT P.poolid FROM pool P WHERE P.name = 'FullFile-04')) ORDER BY J.JobBytes;
Some notes:
- Look at two Bacula pools, FullFile, and MonthyBackups.
- Look only at status T (Completed successfully) and W (don’t ask me, I’m not sure what that is)
- Make sure the job actually backed something up (J.jobBytes > 0)
- Only consider jobs which have not already been copies over the FullFile-04 pool (our destination).
That query returns 1988 rows looking like this:
jobid | starttime | jobbytes | pg_size_pretty | name --------+---------------------+--------------+----------------+---------------------------------------------- 352953 | 2023-03-25 20:01:51 | 7538 | 7538 bytes | poudriere patches 316186 | 2020-09-06 03:05:32 | 23909 | 23 kB | tallboy Papers Jail PostgreSQL Configuration 317515 | 2020-10-04 03:05:46 | 23909 | 23 kB | tallboy Papers Jail PostgreSQL Configuration 318709 | 2020-11-01 03:05:34 | 23909 | 23 kB | tallboy Papers Jail PostgreSQL Configuration ... 357135 | 2023-08-06 03:05:12 | 23909 | 23 kB | tallboy Papers Jail PostgreSQL Configuration 324595 | 2021-03-07 03:09:41 | 293572 | 287 kB | bast pfsense config.xml ... 349638 | 2022-12-22 03:09:09 | 366546 | 358 kB | bast pfsense config.xml 348601 | 2022-11-23 03:06:36 | 367428 | 359 kB | bast pfsense config.xml 321739 | 2021-01-03 03:05:07 | 1817031 | 1774 kB | tallboy home ... 323301 | 2021-02-07 03:05:03 | 543772633851 | 506 GB | slocum jail snapshots 324579 | 2021-03-07 03:05:48 | 575366356793 | 536 GB | slocum jail snapshots 325887 | 2021-04-04 03:06:27 | 602446699824 | 561 GB | slocum jail snapshots
My goal, get a list of jobids for the latest of each of those groups of names.
Doing the grouping
I started off with this:
with all_jobs as( SELECT DISTINCT J.JobId, J.StartTime, J.name FROM Job J, Pool P WHERE P.Name IN ('FullFile', 'MonthlyBackups') AND P.PoolId = J.PoolId AND J.Type = 'B' AND J.JobStatus IN ('T','W') AND J.jobBytes > 0 AND J.JobId NOT IN (SELECT J.PriorJobId FROM Job J WHERE J.Type IN ('B','C') AND J.JobStatus IN ('T','W') AND J.PriorJobId != 0 AND J.PoolId IN (SELECT P.poolid FROM pool P WHERE P.name = 'FullFile-04')) AND J.name in ('r710-01 basic', 'slocum basic') ORDER BY J.name) select JobId, max(StartTime), name from all_jobs group by name, jobid order by 2 desc, name;
For ease of use, I restricted it to just two job names. My first idea, and I knew it would not work because of grouping, but I tried it.
The output is:
jobid | max | name --------+---------------------+--------------- 351172 | 2023-02-05 03:05:15 | slocum basic 349949 | 2023-01-01 03:05:16 | slocum basic 348942 | 2022-12-04 03:05:14 | slocum basic 346513 | 2022-11-06 03:05:16 | slocum basic 345154 | 2022-10-02 03:05:04 | slocum basic 344195 | 2022-09-04 03:05:15 | slocum basic 343284 | 2022-08-07 03:05:15 | slocum basic 342111 | 2022-07-03 03:05:16 | slocum basic 341120 | 2022-06-05 03:05:42 | slocum basic 339927 | 2022-05-01 03:05:20 | slocum basic 338932 | 2022-04-03 03:05:14 | slocum basic 337941 | 2022-03-06 03:05:45 | slocum basic 336950 | 2022-02-06 03:05:14 | slocum basic 335728 | 2022-01-02 03:05:14 | slocum basic 334737 | 2021-12-05 03:05:14 | slocum basic 333680 | 2021-11-07 03:06:15 | slocum basic 332434 | 2021-10-03 03:05:27 | slocum basic 331433 | 2021-09-05 03:05:32 | slocum basic 330205 | 2021-08-01 03:05:45 | slocum basic 329138 | 2021-07-04 03:05:34 | slocum basic 328198 | 2021-06-06 03:05:30 | slocum basic 327009 | 2021-05-02 03:06:43 | slocum basic 325899 | 2021-04-04 03:05:27 | slocum basic 324591 | 2021-03-07 03:06:20 | slocum basic 323313 | 2021-02-07 03:05:33 | slocum basic 321743 | 2021-01-03 03:05:34 | slocum basic 320481 | 2020-12-06 03:05:28 | slocum basic 318710 | 2020-11-01 03:05:40 | slocum basic 317516 | 2020-10-04 03:05:53 | slocum basic 316187 | 2020-09-06 03:05:36 | slocum basic 314658 | 2020-08-02 03:10:21 | slocum basic 313293 | 2020-07-05 03:11:05 | slocum basic 311897 | 2020-06-07 03:08:49 | slocum basic 310248 | 2020-05-03 03:07:49 | slocum basic 308939 | 2020-04-05 03:08:34 | slocum basic 307269 | 2020-03-01 03:10:23 | slocum basic 297160 | 2019-10-06 03:52:24 | r710-01 basic 295628 | 2019-09-01 04:06:34 | r710-01 basic 294271 | 2019-08-04 03:34:00 | r710-01 basic 293102 | 2019-07-07 04:34:58 | r710-01 basic 291870 | 2019-06-09 15:08:44 | r710-01 basic 290507 | 2019-05-05 03:39:48 | r710-01 basic 289294 | 2019-04-07 04:19:18 | r710-01 basic 287816 | 2019-03-03 03:58:34 | r710-01 basic 286601 | 2019-02-03 04:03:21 | r710-01 basic 285225 | 2019-01-06 03:38:34 | r710-01 basic 283663 | 2018-12-02 03:30:50 | r710-01 basic 281760 | 2018-11-04 17:34:30 | r710-01 basic 280604 | 2018-10-07 03:27:15 | r710-01 basic 279232 | 2018-09-02 03:39:33 | r710-01 basic 278340 | 2018-08-11 03:05:07 | r710-01 basic 278105 | 2018-08-05 03:34:31 | r710-01 basic 276761 | 2018-07-01 03:34:31 | r710-01 basic 275608 | 2018-06-03 03:34:09 | r710-01 basic 274449 | 2018-05-06 03:25:22 | r710-01 basic 273055 | 2018-04-01 03:38:38 | r710-01 basic 271834 | 2018-03-04 03:42:58 | r710-01 basic (57 rows)
Given that date, the goal is this output:
jobid | starttime | name --------+---------------------+--------------- 297160 | 2019-10-06 03:52:24 | r710-01 basic 351172 | 2023-02-05 03:05:15 | slocum basic
The working query
Using that query, and using a CTE (Common Table Expression) (see A PostgreSQL query runs in 1.7s – add a LIMIT 10, runs in 28.2 seconds. The working query was:
with max_start_times as ( SELECT max(J.SchedTime) as starttime, J.name FROM Job J, Pool P WHERE P.Name IN ('FullFile', 'MonthlyBackups') AND P.PoolId = J.PoolId AND J.Type = 'B' AND J.JobStatus IN ('T','W') AND J.jobBytes > 0 AND J.JobId NOT IN (SELECT J.PriorJobId FROM Job J WHERE J.Type IN ('B','C') AND J.JobStatus IN ('T','W') AND J.PriorJobId != 0 AND J.PoolId IN (SELECT P.poolid FROM pool P WHERE P.name = 'FullFile-04')) group by J.name ORDER BY J.name) select J.JobId, J.SchedTime, J.name, J.jobbytes, pg_size_pretty(J.JobBytes) from job J, max_start_times where J.SchedTime = max_start_times.starttime and J.name = max_start_times.name order by schedtime;
The output is:
jobid | schedtime | name | jobbytes | pg_size_pretty --------+---------------------+----------------------------------------------+--------------+---------------- 60369 | 2011-05-01 22:58:12 | kraken | 5099585508 | 4863 MB 271806 | 2018-03-03 22:18:21 | dbclone databases | 7246057113 | 6910 MB 284091 | 2018-12-08 23:54:50 | bacula basic | 31011756 | 30 MB 297161 | 2019-10-06 03:05:01 | r710-01 jail snapshots | 14360201232 | 13 GB 297162 | 2019-10-06 03:05:01 | r710-01 distfiles | 6113281371 | 5830 MB 297160 | 2019-10-06 03:05:01 | r710-01 basic | 231577474 | 221 MB 299757 | 2019-11-29 19:47:59 | dbclone basic | 52476628 | 50 MB 313388 | 2020-07-05 23:30:00 | dent | 157174284597 | 146 GB 314650 | 2020-08-02 03:05:00 | x8dtu message archive snapshots | 22597751942 | 21 GB 325911 | 2021-04-04 03:05:01 | pkg01 poudriere snapshots | 150882961830 | 141 GB 325918 | 2021-04-04 04:01:00 | mailjail snapshot | 8278322680 | 7895 MB 341108 | 2022-06-05 03:04:01 | supernews FP msgs | 11750382127 | 11 GB 341125 | 2022-06-05 03:05:02 | supernews | 4395964621 | 4192 MB 341124 | 2022-06-05 03:05:02 | supernews basic | 776608946 | 741 MB 351158 | 2023-02-05 03:04:00 | slocum jail snapshots | 42472487155 | 40 GB 351172 | 2023-02-05 03:05:01 | slocum basic | 501044558 | 478 MB 351170 | 2023-02-05 03:05:01 | r720-01 basic | 339367021 | 324 MB 351171 | 2023-02-05 03:05:01 | r720-01 jail snapshots | 25069045437 | 23 GB 351173 | 2023-02-05 03:05:01 | slocum home | 34373297796 | 32 GB 351174 | 2023-02-05 03:05:02 | slocum dev data snapshots | 139044882429 | 129 GB 352953 | 2023-03-25 20:01:47 | poudriere patches | 7538 | 7538 bytes 357115 | 2023-08-06 03:04:00 | knew jail snapshots | 104678222990 | 97 GB 357116 | 2023-08-06 03:04:00 | repo-svn-snapshots | 8014004651 | 7643 MB 357117 | 2023-08-06 03:04:00 | repo-git-snapshots | 203508338 | 194 MB 357120 | 2023-08-06 03:05:00 | gelt basic | 501464070 | 478 MB 357119 | 2023-08-06 03:05:00 | fileserver basic | 302943799 | 289 MB 357121 | 2023-08-06 03:05:00 | gelt | 766176896 | 731 MB 357122 | 2023-08-06 03:05:00 | knew basic | 736416627 | 702 MB 357124 | 2023-08-06 03:05:01 | mydev basic | 20202304 | 19 MB 357125 | 2023-08-06 03:05:01 | mydev home dir | 1762280497 | 1681 MB 357138 | 2023-08-06 03:05:01 | zuul basic | 539152403 | 514 MB 357139 | 2023-08-06 03:05:01 | zuul jail snapshots | 178686051440 | 166 GB 357140 | 2023-08-06 03:05:01 | BackupCatalog | 150488552494 | 140 GB 357126 | 2023-08-06 03:05:01 | r730-01 basic | 263610998 | 251 MB 357127 | 2023-08-06 03:05:01 | r730-01 jail snapshots | 79460721641 | 74 GB 357128 | 2023-08-06 03:05:01 | unifi | 1955982903 | 1865 MB 357129 | 2023-08-06 03:05:01 | svn basic | 130720665 | 125 MB 357130 | 2023-08-06 03:05:01 | svn everything | 7207752679 | 6874 MB 357131 | 2023-08-06 03:05:01 | tallboy basic | 43215780561 | 40 GB 357132 | 2023-08-06 03:05:01 | tallboy home | 82493261 | 79 MB 357133 | 2023-08-06 03:05:01 | tallboy jail snapshots | 7162183004 | 6830 MB 357134 | 2023-08-06 03:05:01 | tallboy Papers Jail | 4273469580 | 4075 MB 357135 | 2023-08-06 03:05:01 | tallboy Papers Jail PostgreSQL Configuration | 23909 | 23 kB 357136 | 2023-08-06 03:05:01 | x8dtu basic | 301593185 | 288 MB 357137 | 2023-08-06 03:05:01 | x8dtu jail snapshots | 18841297618 | 18 GB 357645 | 2023-08-18 11:37:21 | r730-03 basic | 226418967 | 216 MB 357680 | 2023-08-20 03:05:00 | bast pfsense config.xml | 307570 | 300 kB 357690 | 2023-08-20 03:05:00 | ansible | 13154231 | 13 MB (48 rows)
That looks right. This is the query I’ll use to start copying over my data.
How much space is that?
with max_start_times as ( SELECT max(J.SchedTime) as starttime, J.name FROM Job J, Pool P WHERE P.Name IN ('FullFile', 'MonthlyBackups') AND P.PoolId = J.PoolId AND J.Type = 'B' AND J.JobStatus IN ('T','W') AND J.jobBytes > 0 AND J.JobId NOT IN (SELECT J.PriorJobId FROM Job J WHERE J.Type IN ('B','C') AND J.JobStatus IN ('T','W') AND J.PriorJobId != 0 AND J.PoolId IN (SELECT P.poolid FROM pool P WHERE P.name = 'FullFile-04')) group by J.name ORDER BY J.name) select pg_size_pretty(sum(J.JobBytes)) from job J, max_start_times where J.SchedTime = max_start_times.starttime and J.name = max_start_times.name;
Giving:
pg_size_pretty ---------------- 1155 GB (1 row)
Note that this is the size of the raw data. On ZFS, with compression, this might be surprisingly less.
What clients have no backups in this list?
When writing this post, I thought: what clients have no backups?
select C.name FROM client C WHERE C.clientid not in ( SELECT J.clientid FROM Job J, Pool P WHERE P.Name IN ('FullFile', 'MonthlyBackups') AND P.PoolId = J.PoolId AND J.Type = 'B' AND J.JobStatus IN ('T','W') AND J.jobBytes > 0 AND J.JobId NOT IN (SELECT J.PriorJobId FROM Job J WHERE J.Type IN ('B','C') AND J.JobStatus IN ('T','W') AND J.PriorJobId != 0 AND J.PoolId IN (SELECT P.poolid FROM pool P WHERE P.name = 'FullFile-04')) ) order by name;
And we have:
name --------------------------------- bast2-fd batteries-vpn.unixathome.org-fd batteries.unixathome.org-fd crey-fd dent-vpn-fd dfc-fd ducky-fd havoc-fd heckler-fd jester-fd laptop-fd laptop-vpn-fd laptop-wifi-5.3-fd laptop-xp-fd latens-fd lists-fd m20-fd m21-fd macbook-fd mailjail-fd minion-fd nebula-fd nezlok-fd ngaio-fd nyi-fd nz-fd pepper-fd polo-fd r610-fd tape01-fd tape02-fd undef-fd w2k-fd webserver-fd wocker-fd xeon-fd (36 rows)
That list is good. Some of those host names I haven’t seen in many years. I’m not worried about no backups for them.
The past year
That query was to get the last backup for some long-gone hosts. It makes sure I have the latest of anything I might need later.
Next, I want to get all the backups done over the past year. That should be about 6TB. Let’s see:
SELECT J.JobId, J.SchedTime, J.name, J.jobbytes, pg_size_pretty(J.JobBytes) FROM Job J, Pool P WHERE P.Name IN ('FullFile', 'MonthlyBackups') AND P.PoolId = J.PoolId AND J.Type = 'B' AND J.JobStatus IN ('T','W') AND J.jobBytes > 0 AND J.endtime > current_timestamp - interval '12 months' AND J.JobId NOT IN (SELECT J.PriorJobId FROM Job J WHERE J.Type IN ('B','C') AND J.JobStatus IN ('T','W') AND J.PriorJobId != 0 AND J.PoolId IN (SELECT P.poolid FROM pool P WHERE P.name = 'FullFile-04')) ORDER BY J.name
That brings me back 1049 jobs. How much space is that?
SELECT pg_size_pretty(sum(J.JobBytes)) FROM Job J, Pool P WHERE P.Name IN ('FullFile', 'MonthlyBackups') AND P.PoolId = J.PoolId AND J.Type = 'B' AND J.JobStatus IN ('T','W') AND J.jobBytes > 0 AND J.endtime > current_timestamp - interval '12 months' AND J.JobId NOT IN (SELECT J.PriorJobId FROM Job J WHERE J.Type IN ('B','C') AND J.JobStatus IN ('T','W') AND J.PriorJobId != 0 AND J.PoolId IN (SELECT P.poolid FROM pool P WHERE P.name = 'FullFile-04'));
Giving:
pg_size_pretty ---------------- 6829 GB (1 row)
Which matches up with my estimates from Bacula – calculating Maximum Volume Bytes and Maximum Volumes based on historical data. A full monthly backup of everything is 500GB. A years worth of that is 6TB.
Room for improvement
I could purge my daily backups of the Catalog. That’s the database used by Bacula. I don’t need 1367 copies. I could keep 3 months of daily backups and then just keep monthly backups.
How much space are my recent Catalog backups taking now? Recent such backups are about 150GB each. Let’s see:
bacula=# SELECT pg_size_pretty(sum(J.JobBytes)) FROM Job J, Pool P WHERE P.PoolId = J.PoolId AND J.Type = 'B' AND J.JobStatus IN ('T','W') AND J.jobBytes > 0 AND J.endtime > current_timestamp - interval '3 months' AND J.name = 'BackupCatalog' AND J.JobId NOT IN (SELECT J.PriorJobId FROM Job J WHERE J.Type IN ('B','C') AND J.JobStatus IN ('T','W') AND J.PriorJobId != 0 AND J.PoolId IN (SELECT P.poolid FROM pool P WHERE P.name = 'FullFile-04')); pg_size_pretty ---------------- 7408 GB (1 row)
So. 7.4TB for three months – yeah, I won’t copy all that over, just the recent backups. Looking at all such backups, BackupCatalog consumes 18TB of space.
I already split BackupCatalog into diff, inc, and full backups. That handles the retention and rotation.
Once there is a new Catalog backup, the old ones are of no use. Each one is a full backup (it’s just one file).
Let’s keep monthly BackupCatalog for a year, like others. No extra work required. However, I’d like to create separate pools just for BackupCatalog:
- Incrementals – run daily, kept for 10 days
- Differentials – run every Sunday, except for the first Sunday (that’s the Full backup), kept for 6 weeks
That would be 10 incrementals + 6 differentials (each of which is a full backup of that file). That’s 16 * 160 = 1.5TB just for Catalog backups. Plus the 12 monthlies and we have a total of 2.4TB for Catalogs. That’s insane.
I know what I have to do: run dbcheck. I think I have never run that. Let’s try it one day and see how it goes.
What’s next
Next, I design the job for this and test it. I think that will come in another blog post.