Oct 042012
 

I have been using Bacula since early 2004. That’s nearly 9 years of great backups. Back in early 2010, I set up a multi-terabyte system in my basement with commodity hardware. Today, after about 18 months of backups, it’s starting to fill up. Now is the time to start restricting the creation of new Volumes in order to force the recycling of older expired (but still on disk) Volumes. NOTE: actually, I did not have to implement that bit… I found free disk space.

The first step, how much space is being used now by each pool?

bacula=#   SELECT P.name,
bacula-#          MT.mediatype,
bacula-#          count(M.mediaid),
bacula-#          pg_size_pretty(sum(M.volbytes)::bigint)
bacula-#     FROM pool P LEFT OUTER JOIN media M
bacula-#       ON P.poolid = M.poolid
bacula-#          LEFT OUTER JOIN mediatype MT ON M.mediatype = MT.mediatype
bacula-# GROUP BY P.name, MT.mediatype
bacula-# ORDER BY sum(M.volbytes) DESC;
        name        | mediatype | count | pg_size_pretty
--------------------+-----------+-------+----------------
 Incrementals       | DLT       |   126 | 6668 GB
 Fulls              | DLT       |   136 | 5548 GB
 FullFile           | File      |  1078 | 5387 GB
 Default            | DLT       |    81 | 5137 GB
 MegaFilePool       | File      |   814 | 4070 GB
 FullsFile          | File      |   241 | 1199 GB
 Differentials      | DLT       |    26 | 1068 GB
 IncrFile           | File      |    95 | 471 GB
 FullFileNoNextPool | File      |    92 | 455 GB
 IncrFileNoNextPool | File      |    79 | 394 GB
 FilePool           | File      |    66 | 325 GB
 DiffFile           | File      |    46 | 227 GB
 Scratch            | DLT       |     2 | 147 GB
 FullBackupsFile    | File      |     1 | 2161 MB
(14 rows)

I can ignore the DLT type, because those are tape backups. Thus, I want only the File backups.

What I notice here is FullsFile and FullFile. I suspect this is a typo which was later created, but there are 241 Volumes in the FullsFile Pool.

How old are the backups in this pool?

bacula=# SELECT M.volumename, M.lastwritten from media M where M.poolid = (SELECT P.poolid from pool P where P.name = 'FullsFile') ORDER BY lastwritten ASC LIMIT 2;
  volumename   |     lastwritten
---------------+---------------------
 FileAuto-0275 | 2010-03-07 21:37:20
 FileAuto-0330 | 2010-03-14 06:45:15
(2 rows)

bacula=# SELECT M.volumename, M.lastwritten from media M where M.poolid = (SELECT P.poolid from pool P where P.name = 'FullsFile') ORDER BY lastwritten DESC LIMIT 2;
  volumename   |     lastwritten
---------------+---------------------
 FileAuto-1352 | 2010-12-01 12:30:33
 FileAuto-1340 | 2010-11-30 08:25:57
(2 rows)

Backups in this pool range from 2010-03-07 to 2010-11-30. Anything here is at least 23 months old. I can safely get rid of this knowing that I have two years of backups in the FullFile pool:

bacula=# SELECT M.volumename, M.lastwritten from media M where M.poolid = (SELECT P.poolid from pool P where P.name = 'FullFile') ORDER BY lastwritten ASC LIMIT 2;
  volumename   |     lastwritten
---------------+---------------------
 FullAuto-1361 | 2010-12-05 05:57:23
 FullAuto-1369 | 2010-12-05 06:24:14
(2 rows)

bacula=# SELECT M.volumename, M.lastwritten from media M where M.poolid = (SELECT P.poolid from pool P where P.name = 'FullFile') ORDER BY lastwritten DESC LIMIT 2;
  volumename   |     lastwritten
---------------+---------------------
 FullAuto-3279 | 2012-10-03 09:37:57
 FullAuto-3278 | 2012-10-03 09:31:22
(2 rows)

When I look at /usr/local/etc/bacula-dir.conf, I find more evidence that this pool is no longer used. In fact, this is the only reference to this pool in this conf file.

#
# probably not used any more
#

Pool {
  Name             = FullsFile
  Pool Type        = Backup
  Recycle          = yes
  AutoPrune        = yes
  Volume Retention = 3 years
  Storage          = MegaFile
  Next Pool        = Fulls

  Maximum Volume Bytes = 5G

  LabelFormat = "FullsAuto-"
}

Looking further at the bacula-dir.conf file, I find references to other Pools no longer in use. In all, we have:

  • FilePool – 325 GB
  • FullBackupsFile – 2161 MB
  • FullsFile – 1199 GB
  • MegaFilePool – 4070 GB

Using the same SQL query approach as shown above, I see that those pools ceased being used in late 2010, with FilePool not being using since 2009. I think I can free up 5596 GB of space… That’s a far bit. I can just delete those pools, remove the files, and be done with it.

Now, let’s purge each of those Volumes (which removes all records of Jobs and Files from the database). To do this, we will use a shell script.

First, let’s get a list of all the Volumes to purge:

echo "SELECT M.volumename
    FROM pool P LEFT OUTER JOIN media M
      ON P.poolid = M.poolid
   WHERE P.name in ('FilePool', 'FullBackupsFile', 'FullsFile', 'MegaFilePool')
ORDER BY volumename;

" | psql --tuples-only --no-align bacula

We can feed that output into xargs, and then back into Bacula.

Actually, I think I want to try pruning first. Then let’s see if there are any non-expired Volumes. Just for my own curiosity. Purging deletes data from the database regardless of retention specifications. Pruning, on the other hand, respects retention.

Here is the next draft of the shell script to prune Bacula Volumes:

echo "SELECT M.volumename
    FROM pool P LEFT OUTER JOIN media M
      ON P.poolid = M.poolid
   WHERE P.name in ('FilePool', 'FullBackupsFile', 'FullsFile', 'MegaFilePool')
ORDER BY volumename;

" | psql --tuples-only --no-align bacula | head -6 | xargs -n 1 -I % echo 'prune volume="%" yes'
prune volume="FileAuto-0148" yes
prune volume="FileAuto-0149" yes
prune volume="FileAuto-0150" yes
prune volume="FileAuto-0151" yes
prune volume="FileAuto-0153" yes
prune volume="FileAuto-0154" yes

xargs, takes the incoming output, and, one at a time, does something them with it. In this case, we will pipe the output to bconsole. Much like this example:

[dan@ngaio:~] $ echo 'prune volume="FileAuto-0151" yes' | bconsole
Connecting to Director bacula.unixathome.org:9101
1000 OK: bacula-dir Version: 5.2.6 (21 February 2012)
Enter a period to cancel a command.
prune volume="FileAuto-0151" yes
Automatically selected Catalog: MyCatalog
Using Catalog "MyCatalog"
The current Volume retention period is: 1 year
There are no more Jobs associated with Volume "FileAuto-0151". Marking it purged.
You have messages.
[dan@ngaio:~] $

So, here goes a simple test:

[dan@ngaio:~] $ echo "SELECT M.volumename
    FROM pool P LEFT OUTER JOIN media M
      ON P.poolid = M.poolid
   WHERE P.name in ('FilePool', 'FullBackupsFile', 'FullsFile', 'MegaFilePool')
ORDER BY volumename;

" | psql --tuples-only --no-align bacula | head -6 | xargs -n 1 -I % echo 'prune volume="%" yes' | bconsole
Connecting to Director bacula.unixathome.org:9101
1000 OK: bacula-dir Version: 5.2.6 (21 February 2012)
Enter a period to cancel a command.
prune volume="FileAuto-0148" yes
Automatically selected Catalog: MyCatalog
Using Catalog "MyCatalog"
The current Volume retention period is: 1 year
prune volume="FileAuto-0149" yes
The current Volume retention period is: 1 year
prune volume="FileAuto-0150" yes
The current Volume retention period is: 1 year
prune volume="FileAuto-0151" yes
The current Volume retention period is: 1 year
prune volume="FileAuto-0153" yes
The current Volume retention period is: 1 year
There are no more Jobs associated with Volume "FileAuto-0153". Marking it purged.
prune volume="FileAuto-0154" yes
The current Volume retention period is: 1 year
There are no more Jobs associated with Volume "FileAuto-0154". Marking it purged.
[dan@ngaio:~] $

Now it’s time to remove the ‘head -6’ and run this for all the Volumes in question. I will not show the output, but there was a lot of data still within the retention period. It is important to note that retention period refers to File, Job, and Volume information, not to backups.

My curiosity satisfied, it was time to purge the data. I did this by changing the ‘prune’ in the previous command to ‘purge’. Then let it run. It takes considerably longer to run.

The next step, is to delete those Volume from disk. This is a simple command line rm command. However, this step is complicated by the database server being on one server, and the Volumes on another. The solution: files.

Let’s capture the list of Volume names into a file and copy that file to the other server:

[dan@ngaio:~] $ echo "SELECT M.volumename
    FROM pool P LEFT OUTER JOIN media M
      ON P.poolid = M.poolid
   WHERE P.name in ('FilePool', 'FullBackupsFile', 'FullsFile', 'MegaFilePool')
ORDER BY volumename;

" | psql --tuples-only --no-align bacula  > ListOfVolumes
[dan@ngaio:~] $ scp ListOfVolumes kraken:
ListOfVolumes                              100%   15KB  15.4KB/s   00:00
[dan@ngaio:~] $

Here is the command which let me delete the volume files from the file system. Use with care:

cat ~/ListOfVolumes | xargs -n 1 -I % sudo rm /storage/compressed/bacula/volumes/%

This command deletes the Volumes from the Catalog:

 $ echo "SELECT M.volumename
    FROM pool P LEFT OUTER JOIN media M
      ON P.poolid = M.poolid
   WHERE P.name in ('FilePool', 'FullBackupsFile', 'FullsFile', 'MegaFilePool')
ORDER BY volumename;

" | psql --tuples-only --no-align bacula | xargs -n 1 -I % echo 'delete volume="%" yes' | bconsole

However, this did not free up as much disk space as I expected. That’s when I discovered that I had a number of unused ZFS snapshots. I will deal with that in another post.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

  One Response to “Bacula volumes – running low on disk space”

  1. Hi Dan,

    Being new to Bacula I enjoy reading your diary of experiences with Bacula.

    I’d be greatful if you help me with a bacula ‘problem’. I have a set of backup jobs that run each night all writing to a single disk volume. The last backup ‘job’ in the schedule references an ‘after job’ that sets the status of the volume to “used”. The next nightly backup will use the next available volume. This works really well. What I’d like to do is create a new job to run before I update the volume to “used” to copy the volume to tape, is it possible to identify the volume just been written to and use this value for a copy job. I’m using Bacula with a Postgress DB. Thanks in Advance,