Bacula – copy to tape by job size – Dan Langille's Other Diary

Backups are pretty amazing. The things you can do with them can be both highly entertaining and surprisingly easy.

Today, we will venture into the easy part.

Yesterday was the first Sunday of the month, which, according to my Bacula installation’s schedule is the day for full backups. Today, now that that is done, is copy-to-tape day according to my schedule. I do this manually because I don’t want the tape library, and the dedicated server which writes to it, to be running 24×7. Why not run it all day every day? Cost and heat.

It is a good idea to have backups. It is even better if your backups can be on two different media. This means you should still be able to restore if there is a failure of one. In my case, I backup to disk, then copy to tape, and move the tape to another location.

Power graph showing the power on of the tape library, then the Dell R610.

The job for doing the copy looks like this:

Job {
  Name     = "CopyToTape-Full-LTO4"
  Type     = Copy
  Level    = Full
  Pool     = FullFile
  NextPool = FullsLTO4

  FileSet   = "EmptyCopyToTape"

  Client      = crey-fd 
  Schedule    = "Never"
  Storage     = r610
  Messages    = Standard

# Let's us normal priority, just so other backups can occur while we copy to
# tape (e.g. mailjail)
#  Priority    = 430

#  RunAfterJob  = "/usr/local/bacula/dlt-stats-r610" # run this on the host system, not the jail.  trouble with xpt0 access from within jail

  # the tape drive is now on a remote server
#  Spool Data       = yes
  Spool Attributes = yes

  Maximum Concurrent Jobs = 40

  Selection Type = SQL Query
  Selection Pattern = "
 SELECT DISTINCT J.JobId, J.StartTime
   FROM Job J, Pool P
   WHERE P.Name IN ('FullFile', 'MonthlyBackups')
     AND P.PoolId = J.PoolId
     AND J.Type = 'B' 
     AND J.JobStatus IN ('T','W')
     AND J.jobBytes > 0
     AND J.JobId NOT IN
         (SELECT PriorJobId 
            FROM Job
           WHERE Type IN ('B','C')
             AND Job.JobStatus IN ('T','W')
             AND PriorJobId != 0
             AND Job.PoolId IN (SELECT pool.poolid FROM pool WHERE name = 'FullsLTO4'))
     AND J.endtime > current_timestamp - interval '110 day'
ORDER BY J.StartTime
"
}

Let’s look at the output of that SQL query to see what we have. Let’s also add in JobBtyes to see how large these backups are.

The query

I use SQL to determine what jobs to copy. It gives me more flexibility. Here is what would be copied, and in this order, should I run that job now:

 SELECT DISTINCT J.JobId, J.StartTime, J.jobbytes, pg_size_pretty(J.JobBytes)
   FROM Job J, Pool P
   WHERE P.Name IN ('FullFile', 'MonthlyBackups')
     AND P.PoolId = J.PoolId
     AND J.Type = 'B' 
     AND J.JobStatus IN ('T','W')
     AND J.jobBytes > 0
     AND J.JobId NOT IN
         (SELECT PriorJobId 
            FROM Job
           WHERE Type IN ('B','C')
             AND Job.JobStatus IN ('T','W')
             AND PriorJobId != 0
             AND Job.PoolId IN (SELECT pool.poolid FROM pool WHERE name = 'FullsLTO4'))
     AND J.endtime > current_timestamp - interval '110 day'
ORDER BY J.StartTime;

 jobid  |      starttime      |   jobbytes   | pg_size_pretty 
--------+---------------------+--------------+----------------
 264318 | 2017-09-03 03:04:05 |  12931379115 | 12 GB
 264317 | 2017-09-03 03:04:06 | 303397514752 | 283 GB
 264316 | 2017-09-03 03:04:07 | 612572982936 | 571 GB
 264328 | 2017-09-03 03:05:01 |     37254946 | 36 MB
 264321 | 2017-09-03 03:05:02 |    509798405 | 486 MB
 264327 | 2017-09-03 03:05:02 |       215605 | 211 kB
 264329 | 2017-09-03 03:05:02 |    375857649 | 358 MB
 264333 | 2017-09-03 03:05:02 |     62354023 | 59 MB
 264319 | 2017-09-03 03:05:03 |    537822887 | 513 MB
 264331 | 2017-09-03 03:05:03 |     22460534 | 21 MB
 264332 | 2017-09-03 03:06:23 |   1723428153 | 1644 MB
 264334 | 2017-09-03 03:06:23 |   9671149409 | 9223 MB
 264330 | 2017-09-03 03:10:43 |   2183339517 | 2082 MB
 264322 | 2017-09-03 03:14:49 |     10758898 | 10 MB
 264323 | 2017-09-03 03:16:09 |   6304145819 | 6012 MB
 264320 | 2017-09-03 03:17:16 | 104445006754 | 97 GB
 264326 | 2017-09-03 05:00:59 |   5497202777 | 5243 MB
 264335 | 2017-09-03 15:14:50 |   1421915823 | 1356 MB
 264336 | 2017-09-03 15:44:58 |  12155344299 | 11 GB
 264324 | 2017-09-03 21:41:27 |    787287519 | 751 MB
 264325 | 2017-09-03 21:43:32 |   6297826581 | 6006 MB
 264348 | 2017-09-03 23:31:00 | 183112981978 | 171 GB
 264338 | 2017-09-04 01:13:35 |  10501648063 | 10015 MB
 264337 | 2017-09-04 06:14:48 |  63896895900 | 60 GB
(24 rows)

bacula=#

Typically, one last straggler job is left running for some time, because all the other jobs finish. I like to keep as many concurrent jobs running as I can, in an attempt to saturate the tape and avoid stop/start pulses as the tape runs out of data.

Let’s try another order instead.

Sort by size

 SELECT DISTINCT J.JobId, J.StartTime, J.jobbytes, pg_size_pretty(J.JobBytes)
   FROM Job J, Pool P
   WHERE P.Name IN ('FullFile', 'MonthlyBackups')
     AND P.PoolId = J.PoolId
     AND J.Type = 'B' 
     AND J.JobStatus IN ('T','W')
     AND J.jobBytes > 0
     AND J.JobId NOT IN
         (SELECT PriorJobId 
            FROM Job
           WHERE Type IN ('B','C')
             AND Job.JobStatus IN ('T','W')
             AND PriorJobId != 0
             AND Job.PoolId IN (SELECT pool.poolid FROM pool WHERE name = 'FullsLTO4'))
     AND J.endtime > current_timestamp - interval '110 day'
ORDER BY J.JobBytes DESC;

 jobid  |      starttime      |   jobbytes   | pg_size_pretty 
--------+---------------------+--------------+----------------
 264316 | 2017-09-03 03:04:07 | 612572982936 | 571 GB
 264317 | 2017-09-03 03:04:06 | 303397514752 | 283 GB
 264348 | 2017-09-03 23:31:00 | 183112981978 | 171 GB
 264320 | 2017-09-03 03:17:16 | 104445006754 | 97 GB
 264337 | 2017-09-04 06:14:48 |  63896895900 | 60 GB
 264318 | 2017-09-03 03:04:05 |  12931379115 | 12 GB
 264336 | 2017-09-03 15:44:58 |  12155344299 | 11 GB
 264338 | 2017-09-04 01:13:35 |  10501648063 | 10015 MB
 264334 | 2017-09-03 03:06:23 |   9671149409 | 9223 MB
 264323 | 2017-09-03 03:16:09 |   6304145819 | 6012 MB
 264325 | 2017-09-03 21:43:32 |   6297826581 | 6006 MB
 264326 | 2017-09-03 05:00:59 |   5497202777 | 5243 MB
 264330 | 2017-09-03 03:10:43 |   2183339517 | 2082 MB
 264332 | 2017-09-03 03:06:23 |   1723428153 | 1644 MB
 264335 | 2017-09-03 15:14:50 |   1421915823 | 1356 MB
 264324 | 2017-09-03 21:41:27 |    787287519 | 751 MB
 264319 | 2017-09-03 03:05:03 |    537822887 | 513 MB
 264321 | 2017-09-03 03:05:02 |    509798405 | 486 MB
 264329 | 2017-09-03 03:05:02 |    375857649 | 358 MB
 264333 | 2017-09-03 03:05:02 |     62354023 | 59 MB
 264328 | 2017-09-03 03:05:01 |     37254946 | 36 MB
 264331 | 2017-09-03 03:05:03 |     22460534 | 21 MB
 264322 | 2017-09-03 03:14:49 |     10758898 | 10 MB
 264327 | 2017-09-03 03:05:02 |       215605 | 211 kB
(24 rows)

I am going to try this order, mostly because I hope to saturate the 10G fiber connection between the bacula-sd (where the backups are on disk in pool FullFile) and the bacula-sd (r610, where the tape library is located).

I just started this job. I will monitor progress and report back with any issues.

Update 1

The job now writing to tape are:

Running Jobs:
Writing: Incremental Backup job slocum_jail_snapshots JobId=264380 Volume="000031L4"
    pool="FullsLTO4" device="LTO_0" (/dev/nsa0)
    spooling=0 despooling=0 despool_wait=0
    Files=407 Bytes=43,766,102 AveBytes/sec=7,294,350 LastBytes/sec=7,294,350
    FDReadSeqNo=2,982 in_msg=2206 out_msg=5 fd=23
Writing: Incremental Backup job knew_jail_snapshots JobId=264382 Volume="000031L4"
    pool="FullsLTO4" device="LTO_0" (/dev/nsa0)
    spooling=0 despooling=0 despool_wait=0
    Files=1,409 Bytes=43,161,404 AveBytes/sec=7,193,567 LastBytes/sec=7,193,567
    FDReadSeqNo=7,937 in_msg=5459 out_msg=5 fd=28
Writing: Incremental Backup job dent JobId=264384 Volume="000031L4"
    pool="FullsLTO4" device="LTO_0" (/dev/nsa0)
    spooling=0 despooling=0 despool_wait=0
    Files=672 Bytes=40,650,772 AveBytes/sec=6,775,128 LastBytes/sec=6,775,128
    FDReadSeqNo=5,782 in_msg=4047 out_msg=5 fd=30
Writing: Incremental Backup job BackupCatalog JobId=264388 Volume="000031L4"
    pool="FullsLTO4" device="LTO_0" (/dev/nsa0)
    spooling=0 despooling=0 despool_wait=0
    Files=1 Bytes=42,860,675 AveBytes/sec=7,143,445 LastBytes/sec=7,143,445
    FDReadSeqNo=664 in_msg=663 out_msg=5 fd=29
Writing: Incremental Backup job zuul_jail_snapshots JobId=264386 Volume="000031L4"
    pool="FullsLTO4" device="LTO_0" (/dev/nsa0)
    spooling=0 despooling=0 despool_wait=0
    Files=119 Bytes=41,492,440 AveBytes/sec=6,915,406 LastBytes/sec=6,915,406
    FDReadSeqNo=1,223 in_msg=1026 out_msg=5 fd=33
Writing: Incremental Backup job supernews_FP_msgs JobId=264390 Volume="000031L4"
    pool="FullsLTO4" device="LTO_0" (/dev/nsa0)
    spooling=0 despooling=0 despool_wait=0
    Files=2,728 Bytes=41,563,952 AveBytes/sec=6,927,325 LastBytes/sec=6,927,325
    FDReadSeqNo=24,710 in_msg=16540 out_msg=5 fd=32
Writing: Incremental Backup job supernews JobId=264392 Volume="000031L4"
    pool="FullsLTO4" device="LTO_0" (/dev/nsa0)
    spooling=0 despooling=0 despool_wait=0
    Files=2,608 Bytes=41,647,917 AveBytes/sec=6,941,319 LastBytes/sec=6,941,319
    FDReadSeqNo=15,841 in_msg=10613 out_msg=5 fd=31
Writing: Incremental Backup job svn_everything JobId=264396 Volume="000031L4"
    pool="FullsLTO4" device="LTO_0" (/dev/nsa0)
    spooling=0 despooling=0 despool_wait=0
    Files=874 Bytes=43,871,136 AveBytes/sec=7,311,856 LastBytes/sec=7,311,856
    FDReadSeqNo=7,700 in_msg=5343 out_msg=5 fd=22
Writing: Incremental Backup job mailjail_snapshot JobId=264394 Volume="000031L4"
    pool="FullsLTO4" device="LTO_0" (/dev/nsa0)
    spooling=0 despooling=0 despool_wait=0
    Files=472 Bytes=39,419,836 AveBytes/sec=6,569,972 LastBytes/sec=6,569,972
    FDReadSeqNo=4,508 in_msg=3188 out_msg=5 fd=24
Writing: Incremental Backup job slocum_home JobId=264400 Volume="000031L4"
    pool="FullsLTO4" device="LTO_0" (/dev/nsa0)
    spooling=0 despooling=0 despool_wait=0
    Files=54 Bytes=40,756,369 AveBytes/sec=6,792,728 LastBytes/sec=6,792,728
    FDReadSeqNo=954 in_msg=843 out_msg=5 fd=25
====