Tarsnap – cleaning up old backups

I use Tarsnap for my critical data. Case in point, I use it to backup my Bacula database dump. I use Bacula to backup my hosts. The database in question keeps track of what was backed up, from what host, the file size, checksum, where that backup is now, and many other items. Losing this data is annoying but not a disaster. It can be recreated from the backup volumes, but that is time consuming. As it is, the file is dumped daily, and rsynced to multiple locations.

I also backup that database daily via tarsnap. I’ve been doing this since at least 2015-10-09.

The uncompressed dump of this PostgreSQL database is now about 117G.

# ls -l bacula.dump 
-rw-r-----  1 10839  10839  125497737071 Sep  8 03:29 bacula.dump

Let’s look at recent usage by that host:

tarsnap recent usaage
tarsnap recent usaage

The latest Daily storage value is 96G.

Using this command, I obtained a list of the archives stored:

tarsnap --list-archives -vv > ~/tarsnap-knew-archive-list

See man 1 tarsnap

I found 1751 archives, the oldest one was created on 2015-10-08 19:01:17.

This is a great example of Tarsnap deduplication and compression. I have 5 years of backups taking up only 96G and the latest backup is 113G.

By comparison, my other tarsnap backups take up this amount of space:


backup size
bacula dump 96G
bacula configuration 13.7G
subversion 8G
supernews 32.5G
zuul-postgresql 0.18G
zuul-mysql 0.57G
zuul-pg02 5.7G

I’m going to trim down the dump archives, for sure.

I’m curious about that bacula configuration archive. The Bacula configuration is only about 600K:

$ cd /usr/local/etc/bacula
$ sudo du -ch .
608K	.
608K	total
$ 

Checking the archive list for that machine, I find 6 database backups from early October 2015.

Let’s delete those backups first. The names of those archives are:

bacula.int.BaculaDatabase.2015-10-02
bacula.int.BaculaDatabase.2015-10-03
bacula.int.BaculaDatabase.2015-10-05
bacula.int.BaculaDatabase.2015-10-06
bacula.int.BaculaDatabase.2015-10-07
bacula.int.BaculaDatabase.2015-10-08

Let’s delete one:

# tarsnap -d -f bacula.int.BaculaDatabase.2015-10-02
                                       Total size  Compressed size
All archives                         196056412740      57482749453
  (unique data)                       50147278933      14694175940
This archive                          48831544077      14324291125
Deleted data                              2073099          1672760
# 

Let’s delete the rest (based on Delete multiple archives faster:

[dan@bacula:~] $ sudo tarsnap -d \
> -f bacula.int.BaculaDatabase.2015-10-03 \
> -f bacula.int.BaculaDatabase.2015-10-05 \
> -f bacula.int.BaculaDatabase.2015-10-06 \
> -f bacula.int.BaculaDatabase.2015-10-07 \
> -f bacula.int.BaculaDatabase.2015-10-08
                                       Total size  Compressed size
All archives                         147224869967      43158468600
  (unique data)                       50147260360      14694156775
bacula.int.BaculaDatabase.2015-10-03      48831542773      14324280853
Deleted data                                18573            19165
                                       Total size  Compressed size
All archives                          98393327194      28834187747
  (unique data)                       50147241787      14694137610
bacula.int.BaculaDatabase.2015-10-05      48831542773      14324280853
Deleted data                                18573            19165
                                       Total size  Compressed size
All archives                          49561784421      14509906894
  (unique data)                       49265856159      14448990041
bacula.int.BaculaDatabase.2015-10-06      48831542773      14324280853
Deleted data                            881385628        245147569
                                       Total size  Compressed size
All archives                            314745728         65670242
  (unique data)                          19214507          4841679
bacula.int.BaculaDatabase.2015-10-07      49247038693      14444236652
Deleted data                          49246641652      14444148362
                                       Total size  Compressed size
All archives                            314744195         65668842
  (unique data)                          19212974          4840279
bacula.int.BaculaDatabase.2015-10-08             1533             1400
Deleted data                                 1533             1400
[dan@bacula:~] $ [dan@bacula:~] $ sudo tarsnap -d \

I won’t see the change in the ‘Recent account usage by machine’ page because that ‘updates shortly after midnight UTC’. I’ll come back tomorrow.

In the meantime, I think I can delete all my old Bacula database backups from before 2020. For fun, I will keep each backup from 01-01, and the oldest backup.

Here is how I can get that list from the existing file:

[dan@knew:~] $ head /root/tarsnap-knew-archive-list 
bacula.int.BaculaDatabase.2020-08-13	2020-08-13 13:25:00	/usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2020-08-13 bacula.dump
bacula.int.BaculaDatabase.2018-08-17	2018-08-17 13:25:00	/usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2018-08-17 bacula.dump
bacula.int.BaculaDatabase.2018-11-08	2018-11-08 13:25:01	/usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2018-11-08 bacula.dump
bacula.int.BaculaDatabase.2020-07-08	2020-07-08 13:25:00	/usr/local/bin/tarsnap -c -f ˜tarbacula.int.BaculaDatabase.2020-07-08 bacula.dump
bacula.int.BaculaDatabase.2016-05-25	2016-05-25 13:25:02	/usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2016-05-25 bacula.dump
bacula.int.BaculaDatabase.2018-08-09	2018-08-09 13:25:02	/usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2018-08-09 bacula.dump
bacula.int.BaculaDatabase.2016-10-12	2016-10-12 13:25:00	/usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2016-10-12 bacula.dump
bacula.int.BaculaDatabase.2016-01-20	2016-01-20 13:25:00	/usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2016-01-20 bacula.dump
bacula.int.BaculaDatabase.2019-02-06	2019-02-06 13:25:00	/usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2019-02-06 bacula.dump
bacula.int.BaculaDatabase.2016-03-18	2016-03-18 13:25:04	/usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2016-03-18 bacula.dump
[dan@knew:~] $ cut -f 1 -w /root/tarsnap-knew-archive-list  | head
bacula.int.BaculaDatabase.2020-08-13
bacula.int.BaculaDatabase.2018-08-17
bacula.int.BaculaDatabase.2018-11-08
bacula.int.BaculaDatabase.2020-07-08
bacula.int.BaculaDatabase.2016-05-25
bacula.int.BaculaDatabase.2018-08-09
bacula.int.BaculaDatabase.2016-10-12
bacula.int.BaculaDatabase.2016-01-20
bacula.int.BaculaDatabase.2019-02-06
bacula.int.BaculaDatabase.2016-03-18
[dan@knew:~] $ 

Oh wait, let’s sort that to get a proper range:

[dan@knew:/root] $ sort tarsnap-knew-archive-list | tail -2
bacula.int.BaculaDatabase.2020-09-05	2020-09-05 13:25:00	/usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2020-09-05 bacula.dump
bacula.int.BaculaDatabase.2020-09-07	2020-09-07 13:25:00	/usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2020-09-07 bacula.dump
[dan@knew:/root] $ 
[dan@knew:/root] $ sort tarsnap-knew-archive-list | head -2
bacula.int.BaculaDatabase.2015-10-08	2015-10-08 19:01:17	/usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2015-10-08 bacula.dump
bacula.int.BaculaDatabase.2015-10-09	2015-10-09 13:25:00	/usr/local/bin/tarsnap -c -f bacula.int.BaculaDatabase.2015-10-09 bacula.dump

Backups going back 5 years. Yeah, that might be a bit excessive, even for me. I usually keep them for three years at home.

Knowing that, let’s select the entries I want to keep:

[dan@knew:~] $ cut -f 1 -w /root/tarsnap-knew-archive-list  | egrep -e '-01-01|2015-10-08' | sort
bacula.int.BaculaDatabase.2015-10-08
bacula.int.BaculaDatabase.2016-01-01
bacula.int.BaculaDatabase.2017-01-01
bacula.int.BaculaDatabase.2019-01-01
bacula.int.BaculaDatabase.2020-01-01
[dan@knew:~] $ 

I sorted the output just to make it easier.

Now, dump everything else, by using -v, into a file:

[dan@knew:~] $ cut -f 1 -w /root/tarsnap-knew-archive-list  | egrep -ve '-01-01|2015-10-08' > tarsnap-volumes-to-delete
[dan@knew:~] $ wc -l tarsnap-volumes-to-delete 
    1746 tarsnap-volumes-to-delete

Oh wait, I forgot to exclude 2020

[dan@knew:~] $ cut -f 1 -w /root/tarsnap-knew-archive-list  | egrep -ve '-01-01|2015-10-08|bacula.int.BaculaDatabase.2020' > tarsnap-volumes-to-delete
[dan@knew:~] $ wc -l tarsnap-volumes-to-delete 
    1503 tarsnap-volumes-to-delete
[dan@knew:~] $ 

I used an editor to quickly modify that file to look like this:

[dan@knew:~] $ head tarsnap-volumes-to-delete 
#!/bin/sh
tarsnap -d \
-f bacula.int.BaculaDatabase.2018-08-17 \
-f bacula.int.BaculaDatabase.2018-11-08 \
-f bacula.int.BaculaDatabase.2016-05-25 \
-f bacula.int.BaculaDatabase.2018-08-09 \
-f bacula.int.BaculaDatabase.2016-10-12 \
-f bacula.int.BaculaDatabase.2016-01-20 \
-f bacula.int.BaculaDatabase.2019-02-06 \
-f bacula.int.BaculaDatabase.2016-03-18 \
-f bacula.int.BaculaDatabase.2018-01-15 \
[dan@knew:~] $ 

EDIT 2021-09-29: The tarsnap -d \ was missing from the above script until David Newman told me. Thank you.

This delete will take a while so I started a tmux session. I did a chmod +x on the file.

I started the command and went on to do other lines. It is deleting 1500 archives. It will be a few hours at least I think.

[dan@knew:~] $ time sudo ./tarsnap-volumes-to-delete

I wish I sorted that list. I’d know easily where we were.

I know we are on bacula.int.BaculaDatabase.2018-08-19 which is line 836 of 1505.

 $ ps auwwx | grep tmux
dan        78234   0.0  0.0   14344    5872  -  Is   13:15       0:00.36 tmux: server (/tmp//tmux-1001/default) (tmux)

tmux was started at 13:15 and it is now 20:49 – so that’s 7.5 hours to get about half-way through. This should finish overnight.

Night passes….

The next morning I found:

real    819m10.118s
user    372m34.315s
sys     3m23.045s

That is 13 hours and 40 minutes, or about 18 every 10 minutes.

I want to compare before and after disk usage, but I may have to wait until 0000 UTC when the statistics are updated.

The next day (2020-09-10), I found these values:


name before after
bacula dump 96G 59.4GB
bacula configuration 13.7G 4.6MB

That’s telling me that 1503 days of backups required an additional 37GB of storage.

For comparison, the database dump was this size on the indicated dates:


date size FreeBSD version
2015-11-03 50.49 GB 10.2-RELEASE-p2
2016-12-22 56.77 GB 10.3-RELEASE-p6
2017-12-02 65.33 GB 11.1-RELEASE-p1
2018-12-25 77.35 GB 11.2-RELEASE
2019-12-25 100.3 GB 12.0-RELEASE-p5
2020-09-10 125.5 GB -RELEASE-p7

I obtained that data by going back through the old email. You keep your old notifications? I just checked. These bacula notification emails go back to November 2015. Hmm, I also keep those ‘security run output’ emails. They go back to June 2010. Why keep 130,996 old emails? For blog posts like this.

Bandwidth

That large database dump consumed just 1.05GB in bandwidth. Tarsnap sends only the changes. The file is in plain text format.

Storage

I am now using 106.4 GB total. I suspect I can delete some of the older archives there too.

Automation

I should automate the disposal of old archives. One day.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top