I use Tarsnap for my critical data. Case in point, I use it to backup my Bacula database dump. I use Bacula to backup my hosts. The database in question keeps track of what was backed up, from what host, the file size, checksum, where that backup is now, and many other items. Losing this data is annoying but not a disaster. It can be recreated from the backup volumes, but that is time consuming. As it is, the file is dumped daily, and rsynced to multiple locations.
I also backup that database daily via tarsnap. I’ve been doing this since at least 2015-10-09.
The uncompressed dump of this PostgreSQL database is now about 117G.
# ls -l bacula.dump -rw-r----- 1 10839 10839 125497737071 Sep 8 03:29 bacula.dump
Let’s look at recent usage by that host:
The latest Daily storage value is 96G.
Using this command, I obtained a list of the archives stored:
tarsnap --list-archives -vv > ~/tarsnap-knew-archive-list
See man 1 tarsnap
I found 1751 archives, the oldest one was created on 2015-10-08 19:01:17.
This is a great example of Tarsnap deduplication and compression. I have 5 years of backups taking up only 96G and the latest backup is 113G.
By comparison, my other tarsnap backups take up this amount of space:
backup | size |
---|---|
bacula dump | 96G |
bacula configuration | 13.7G |
subversion | 8G |
supernews | 32.5G |
zuul-postgresql | 0.18G |
zuul-mysql | 0.57G |
zuul-pg02 | 5.7G |
I’m going to trim down the dump archives, for sure.
I’m curious about that bacula configuration archive. The Bacula configuration is only about 600K:
$ cd /usr/local/etc/bacula $ sudo du -ch . 608K . 608K total $
Checking the archive list for that machine, I find 6 database backups from early October 2015.
Let’s delete those backups first. The names of those archives are:
bacula.int.BaculaDatabase.2015-10-02 bacula.int.BaculaDatabase.2015-10-03 bacula.int.BaculaDatabase.2015-10-05 bacula.int.BaculaDatabase.2015-10-06 bacula.int.BaculaDatabase.2015-10-07 bacula.int.BaculaDatabase.2015-10-08
Let’s delete one:
# tarsnap -d -f bacula.int.BaculaDatabase.2015-10-02 Total size Compressed size All archives 196056412740 57482749453 (unique data) 50147278933 14694175940 This archive 48831544077 14324291125 Deleted data 2073099 1672760 #
Let’s delete the rest (based on Delete multiple archives faster:
[dan@bacula:~] $ sudo tarsnap -d \ > -f bacula.int.BaculaDatabase.2015-10-03 \ > -f bacula.int.BaculaDatabase.2015-10-05 \ > -f bacula.int.BaculaDatabase.2015-10-06 \ > -f bacula.int.BaculaDatabase.2015-10-07 \ > -f bacula.int.BaculaDatabase.2015-10-08 Total size Compressed size All archives 147224869967 43158468600 (unique data) 50147260360 14694156775 bacula.int.BaculaDatabase.2015-10-03 48831542773 14324280853 Deleted data 18573 19165 Total size Compressed size All archives 98393327194 28834187747 (unique data) 50147241787 14694137610 bacula.int.BaculaDatabase.2015-10-05 48831542773 14324280853 Deleted data 18573 19165 Total size Compressed size All archives 49561784421 14509906894 (unique data) 49265856159 14448990041 bacula.int.BaculaDatabase.2015-10-06 48831542773 14324280853 Deleted data 881385628 245147569 Total size Compressed size All archives 314745728 65670242 (unique data) 19214507 4841679 bacula.int.BaculaDatabase.2015-10-07 49247038693 14444236652 Deleted data 49246641652 14444148362 Total size Compressed size All archives 314744195 65668842 (unique data) 19212974 4840279 bacula.int.BaculaDatabase.2015-10-08 1533 1400 Deleted data 1533 1400 [dan@bacula:~] $ [dan@bacula:~] $ sudo tarsnap -d \
I won’t see the change in the ‘Recent account usage by machine’ page because that ‘updates shortly after midnight UTC’. I’ll come back tomorrow.
In the meantime, I think I can delete all my old Bacula database backups from before 2020. For fun, I will keep each backup from 01-01, and the oldest backup.
Here is how I can get that list from the existing file:
[dan@knew:~] $ head /root/tarsnap-knew-archive-list bacula.int.BaculaDatabase.2020-08-13 2020-08-13 13:25:00 /usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2020-08-13 bacula.dump bacula.int.BaculaDatabase.2018-08-17 2018-08-17 13:25:00 /usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2018-08-17 bacula.dump bacula.int.BaculaDatabase.2018-11-08 2018-11-08 13:25:01 /usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2018-11-08 bacula.dump bacula.int.BaculaDatabase.2020-07-08 2020-07-08 13:25:00 /usr/local/bin/tarsnap -c -f ˜tarbacula.int.BaculaDatabase.2020-07-08 bacula.dump bacula.int.BaculaDatabase.2016-05-25 2016-05-25 13:25:02 /usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2016-05-25 bacula.dump bacula.int.BaculaDatabase.2018-08-09 2018-08-09 13:25:02 /usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2018-08-09 bacula.dump bacula.int.BaculaDatabase.2016-10-12 2016-10-12 13:25:00 /usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2016-10-12 bacula.dump bacula.int.BaculaDatabase.2016-01-20 2016-01-20 13:25:00 /usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2016-01-20 bacula.dump bacula.int.BaculaDatabase.2019-02-06 2019-02-06 13:25:00 /usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2019-02-06 bacula.dump bacula.int.BaculaDatabase.2016-03-18 2016-03-18 13:25:04 /usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2016-03-18 bacula.dump [dan@knew:~] $ cut -f 1 -w /root/tarsnap-knew-archive-list | head bacula.int.BaculaDatabase.2020-08-13 bacula.int.BaculaDatabase.2018-08-17 bacula.int.BaculaDatabase.2018-11-08 bacula.int.BaculaDatabase.2020-07-08 bacula.int.BaculaDatabase.2016-05-25 bacula.int.BaculaDatabase.2018-08-09 bacula.int.BaculaDatabase.2016-10-12 bacula.int.BaculaDatabase.2016-01-20 bacula.int.BaculaDatabase.2019-02-06 bacula.int.BaculaDatabase.2016-03-18 [dan@knew:~] $
Oh wait, let’s sort that to get a proper range:
[dan@knew:/root] $ sort tarsnap-knew-archive-list | tail -2 bacula.int.BaculaDatabase.2020-09-05 2020-09-05 13:25:00 /usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2020-09-05 bacula.dump bacula.int.BaculaDatabase.2020-09-07 2020-09-07 13:25:00 /usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2020-09-07 bacula.dump [dan@knew:/root] $
[dan@knew:/root] $ sort tarsnap-knew-archive-list | head -2 bacula.int.BaculaDatabase.2015-10-08 2015-10-08 19:01:17 /usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2015-10-08 bacula.dump bacula.int.BaculaDatabase.2015-10-09 2015-10-09 13:25:00 /usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2015-10-09 bacula.dump
Backups going back 5 years. Yeah, that might be a bit excessive, even for me. I usually keep them for three years at home.
Knowing that, let’s select the entries I want to keep:
[dan@knew:~] $ cut -f 1 -w /root/tarsnap-knew-archive-list | egrep -e '-01-01|2015-10-08' | sort bacula.int.BaculaDatabase.2015-10-08 bacula.int.BaculaDatabase.2016-01-01 bacula.int.BaculaDatabase.2017-01-01 bacula.int.BaculaDatabase.2019-01-01 bacula.int.BaculaDatabase.2020-01-01 [dan@knew:~] $
I sorted the output just to make it easier.
Now, dump everything else, by using -v, into a file:
[dan@knew:~] $ cut -f 1 -w /root/tarsnap-knew-archive-list | egrep -ve '-01-01|2015-10-08' > tarsnap-volumes-to-delete [dan@knew:~] $ wc -l tarsnap-volumes-to-delete 1746 tarsnap-volumes-to-delete
Oh wait, I forgot to exclude 2020
[dan@knew:~] $ cut -f 1 -w /root/tarsnap-knew-archive-list | egrep -ve '-01-01|2015-10-08|bacula.int.BaculaDatabase.2020' > tarsnap-volumes-to-delete [dan@knew:~] $ wc -l tarsnap-volumes-to-delete 1503 tarsnap-volumes-to-delete [dan@knew:~] $
I used an editor to quickly modify that file to look like this:
[dan@knew:~] $ head tarsnap-volumes-to-delete #!/bin/sh tarsnap -d \ -f bacula.int.BaculaDatabase.2018-08-17 \ -f bacula.int.BaculaDatabase.2018-11-08 \ -f bacula.int.BaculaDatabase.2016-05-25 \ -f bacula.int.BaculaDatabase.2018-08-09 \ -f bacula.int.BaculaDatabase.2016-10-12 \ -f bacula.int.BaculaDatabase.2016-01-20 \ -f bacula.int.BaculaDatabase.2019-02-06 \ -f bacula.int.BaculaDatabase.2016-03-18 \ -f bacula.int.BaculaDatabase.2018-01-15 \ [dan@knew:~] $
EDIT 2021-09-29: The tarsnap -d \ was missing from the above script until David Newman told me. Thank you.
This delete will take a while so I started a tmux session. I did a chmod +x on the file.
I started the command and went on to do other lines. It is deleting 1500 archives. It will be a few hours at least I think.
[dan@knew:~] $ time sudo ./tarsnap-volumes-to-delete
I wish I sorted that list. I’d know easily where we were.
I know we are on bacula.int.BaculaDatabase.2018-08-19 which is line 836 of 1505.
$ ps auwwx | grep tmux dan 78234 0.0 0.0 14344 5872 - Is 13:15 0:00.36 tmux: server (/tmp//tmux-1001/default) (tmux)
tmux was started at 13:15 and it is now 20:49 – so that’s 7.5 hours to get about half-way through. This should finish overnight.
Night passes….
The next morning I found:
real 819m10.118s user 372m34.315s sys 3m23.045s
That is 13 hours and 40 minutes, or about 18 every 10 minutes.
I want to compare before and after disk usage, but I may have to wait until 0000 UTC when the statistics are updated.
The next day (2020-09-10), I found these values:
name | before | after |
---|---|---|
bacula dump | 96G | 59.4GB |
bacula configuration | 13.7G | 4.6MB |
That’s telling me that 1503 days of backups required an additional 37GB of storage.
For comparison, the database dump was this size on the indicated dates:
date | size | FreeBSD version |
---|---|---|
2015-11-03 | 50.49 GB | 10.2-RELEASE-p2 |
2016-12-22 | 56.77 GB | 10.3-RELEASE-p6 |
2017-12-02 | 65.33 GB | 11.1-RELEASE-p1 |
2018-12-25 | 77.35 GB | 11.2-RELEASE |
2019-12-25 | 100.3 GB | 12.0-RELEASE-p5 |
2020-09-10 | 125.5 GB | -RELEASE-p7 |
I obtained that data by going back through the old email. You keep your old notifications? I just checked. These bacula notification emails go back to November 2015. Hmm, I also keep those ‘security run output’ emails. They go back to June 2010. Why keep 130,996 old emails? For blog posts like this.
Bandwidth
That large database dump consumed just 1.05GB in bandwidth. Tarsnap sends only the changes. The file is in plain text format.
Storage
I am now using 106.4 GB total. I suspect I can delete some of the older archives there too.
Automation
I should automate the disposal of old archives. One day.